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Overall  objectives 

Short-term,  episodic  visual  recognition  memory  is  crucial  to  the  success  of  most 
activities  of  everyday  life.  The  ability  to  recognize  and  evaluate  recently-seen  objects 
and  events  makes  it  possible  to  prepare  and  then  execute  appropriate  actions  in  a 
timely  fashion.  Because  this  crucial  capacity  spans  two  distinctly  different  research 
traditions,  vision  and  memory,  it  has  received  far  less  research  effort  than  it  deserves. 
With  the  interdependence  of  vision  and  memory  firmly  in  mind,  we  are  attempting 
to  synthesize  concepts,  insights,  and  methods  from  memory  research,  and  from  vision 
research,  working  within  a  coherent,  quantitative  framework  for  understanding  visual 
recognition  memory,  particularly  for  stimuli  that  resist  rehearsal. 

The  reporting  period  covers  the  period  of  July  1,  2003  to  December  31,  2004. 
During  this  project  we  made  a  strong  progress  on  both  of  our  research  objectives, 
and  on  some  supplementary  objectives  as  well. 

Objective  One. 

Examine  episodic  visual  recognition  memory  for  complex,  high-dimensional  stim¬ 
uli,  that  is,  synthetic  human  faces  (Wilson,  Loffler,  &  Wilkinson,  2002).  Outside  the 
laboratory,  nearly  all  inputs  to  memory  can  only  be  defined  in  high-dimensional  stim¬ 
ulus  spaces.  This  is  true  of  verbal  material  (words),  natural  sounds,  and  scenes.  To 
date,  recognition  memory  research  supported  by  this  project  has  focused  on  simple, 
low-dimensional  stimuli,  that  is,  2-  or  3-dimensional  compound  gratings.  At  issue  is 
(1)  whether  NEMo  (Kahana  &  Sekuler,  2002)  and/or  related  models  can  account  for 
recognition  of  more  complex  stimuli,  which  have  higher  dimensionality,  (2)  whether 
the  general  principles  that  govern  memory  for  low-dimensional  stimuli  hold  also  for 
high-dimensional  stimuli. 

Objective  Two 

Examine  relationship  between  recognition  and  identification  (that  is,  source  mem¬ 
ory).  Guided  by  our  computational  model,  NEMo,  and  our  prior  findings,  work  in 
this  objective  developed  and  tested  a  new  paradigm  for  studying  the  link  between 
recognition  of  particular,  previously  seen  stimuli,  and  the  ability  to  identify  the  con¬ 
text  in  which  the  stimuli  were  encountered.  Specifically,  we  collected  and  modeled 
concurrent  recognition  and  identification  judgments,  applying  signal  detection  and 
information  theory  analyses  to  the  results. 

Status  of  Effort 

During  the  project  period  we  collected  substantial  amounts  of  data  and  carried 
out  model-driven  analyses  related  to  many  of  the  project’s  objectives. 
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Objective  One 

Four  experiments  confirmed  that  high-dimensional  stimuli  (synthetic  human  faces) 
can  provide  useful  insights  into  episodic  recognition  memory.  Our  initial  data  set 
replicated  some,  but  not  all,  basic  phenomena  found  previously  with  simpler  stimuli. 
Using  an  eye-tracker  funded  primarily  by  this  project,  we  examined  the  pattern  of 
fixations  that  subjects  made  while  they  scrutinized  and  encoded  the  face  stimuli  for 
memory.  We  developed  a  reliable  method  for  generating  similarity  judgments,  which 
can  be  transformed  via  multidimensional  scaling  into  a  representation  of  perceptual 
distances  among  faces.  A  manuscript  reporting  this  work  is  under  review. 

Objective  Two 

Three  experiments  examined  connections  between  recognition  and  identifica¬ 
tion,  a  key  attribute  of  of  visual  episodic  memory.  With  compound  gratings  as 
study  and  probe  items,  subjects  judged  whether  a  probe  had  or  had  not  been 
presented  in  the  immediately  preceding  study  series  (a  recognition  judgment),  and 
also  identified  the  serial  position  of  the  study  item  that  matched  the  probe  (an 
identification  judgment).  Recognition  and  identification  responses  were  expressed 
on  a  visual  analogue  scale.  Approximately  75%  of  correct  recognitions  were  ac¬ 
companied  by  correct  identification  of  the  serial  position  of  the  study  item  that 
matched  the  probe.  The  suggests  that  recognition  and  identification  are  based 
on  a  common  source  of  memory  information.  Misidentifications  were  attributable 
to  two  factors:  perceptual  similarity  between  the  wrongly  identified  study  item 
and  the  correct  study  item,  and  the  temporal  proximity  of  the  wrongly  identified 
item  to  the  correct  one.  We  used  receiver  operating  characteristics  (ROCs)  to 
compare  recognition  performance  across  the  three  experiments.  By  combining  signal 
detection  theory  (Wickens,  2002)  and  the  summed  similarity  framework  embodied 
in  NEMo,  we  were  able  to  account  for  the  unusual  slopes  of  our  z-transformed  ROCs. 


Accomplishments/New  Findings 

Work  on  Objective  One:  Recognition  Memory  for  Synthetic  Faces. 

Like  other  strongly-social  creatures,  Homo  sapiens  have  a  knack  for  recognizing 
previously-encountered  conspecifics.  Even  a  brief  glimpse  of  some  face  can  be  enough 
to  allow  a  viewer  to  recognize  that  the  face  had  been  seen  before  (Lehky,  2000). 
Going  beyond  this  simple  judgment  of  familiarity,  additional,  episodic  information 
makes  it  possible  to  judge  the  circumstances  under  which  that  glimpsed  face  had 
been  seen.  To  examine  visual  episodic  memory  for  faces  we  adapted  Sternberg’s 
paradigm  (1966,  1975),  which  can  bridge  visual  psychophysics  and  memory,  research 
domains  that  bracket  visual  recognition  memory. 

Our  research  is  driven  by  a  computational  model  that  has  successfully  accounted 
for  episodic  memory  with  low-dimensional  stimuli,  compound  sinusoidal  gratings 
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that  vary  in  spatial  frequency  and  phase  (Kahana  &  Sekuler,  2002).  This  model, 
NEMo  (Noisy  Exemplar  Model),  provides  a  framework  for  understanding  how  briefly- 
presented  stimuli  are  represented  in  memory,  and  for  identifying  the  way  that  stored 
memories  are  transformed  into  recognition  judgments.  Here,  we  apply  NEMo  to 
episodic  memory  for  specially-designed  human  faces.  Unlike  the  case  for  compound 
sinusoidal  gratings,  essential  aspects  of  visual  processing  of  human  faces  take  place 
several  synapses  beyond  the  primary  visual  cortex.  Because  the  primary  visual  cortex 
participates  not  only  in  visual  encoding  but  also  in  visual  memory  and  related  phe¬ 
nomena  (Magnussen  k  Greenlee,  1999;  Kosslyn,  Thompson,  Kim,  k  Alpert,  1995; 
Klein,  Paradis,  Poline,  Kosslyn,  &  Le  Bihan,  2000),  we  were  interested  in  the  possi¬ 
bility  that  differences  between  visual  processing  of  compound  gratings  and  of  human 
faces  might  produce  corresponding  differences  in  recognition  memory  for  the  two  kinds 
of  stimuli  (Hole,  1996). 

As  readers  may  be  unfamiliar  with  the  class  of  face  stimuli  we  used,  and  because 
the  characteristics  of  these  stimuli  are  central  to  our  research,  it  is  worthwhile  to 
describe  those  stimuli  in  some  detail.  Wilson  et  al.  (2002)  devised  a  method  for 
generating  synthetic  faces  that  are  ideal  stimuli  for  model-driven  research  on  visual 
memory.  Individual  synthetic  faces  are  derived  from  gray-scale  face  photographs  by 
digitizing  37  key  points:  14  points  defining  head  shape,  9  points  for  the  hairline, 
4  points  for  eye  locations,  4  points  for  nose  length  and  width,  5  points  defining  the 
mouth  and  lips,  and  one  point  for  brow  height.  Synthetic  faces  are  then  reconstructed 
from  these  37  measurements  and  bandpass  filtered  with  a  2.0  octave  wide  difference 
of  Gaussians  filter  with  a  peak  frequency  of  10.0  cycles  per  face  width  (Wilson  et  al., 
2002).  Several  studies  have  shown  that  such  filtering  is  optimal  for  face  recognition 
(Gold,  Bennett,  k  Sekuler,  1999;  Nasanen,  1999). 

By  design,  synthetic  faces  eliminate  textures  such  as  skin,  hair,  wrinkles,  etc,  and 
focus  instead  on  geometric  characteristics  of  faces.  However,  this  raises  the  impor¬ 
tant  question  whether  synthetic  faces  are  sufficiently  accurate  representations  of  the 
original  faces  to  be  useful  in  psychophysical  experimentation.  This  question  has  been 
answered  by  requiring  observers  to  identify  the  gray-scale  photograph  from  which  a 
synthetic  face  was  derived  in  a  four  alternative  forced  choice  experiment.  The  mean 
across  five  observers  was  97.4%  correct  in  matching  between  front  view  synthetic 
faces  and  photographs,  and  even  for  matching  between  20  side  view  photographs  and 
front  view  synthetic  faces  (or  vice  versa)  performance  averaged  90.7%  correct  (Wilson 
et  al.,  2002).  As  chance  performance  is  25%  in  these  experiments,  these  data  clearly 
demonstrate  that  synthetic  faces  capture  salient  aspects  of  individual  face  geometry. 
Furthermore,  the  data  base  captures  known  face  gender  differences:  synthetic  female 
faces  have  significantly  smaller  heads,  rounder  chins,  thicker  lips,  and  higher  eyebrows 
than  males.  Finally,  fMRI  signals  from  the  fusiform  face  area  show  that  synthetic 
faces  produce  BOLD  activation  that  is  85%  as  large  as  the  original  gray  scale  faces 
from  which  they  are  derived  (manuscript  in  preparation).  Using  just  37  measure¬ 
ments  strikes  a  balance  between  stimulus  simplicity  and  encoding  sufficient  geometric 
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information  to  characterize  individual  faces.  This  essential  individuality,  which  is 
important  in  episodic  memory,  is  absent  from  some  commonly  used  face  stimuli,  such 
as  Brunswik  faces  (Brunswik  &;  Reiter,  1937;  Sigala,  Gabbiani,  &  Logothetis,  2002; 
Peters,  Gabbiani,  &  Koch,  2003). 

The  use  of  Wilson  faces  minimizes  some  problems  associated  with  other  collections 
of  face  stimuli  (Duchaine  &c  Weidenfeld,  2003).  Furthermore,  small  graded  differences 
among  the  faces  allowed  us  to  have  a  variability  within  individuals,  and  prevent 
subjects  from  learning  and  naming  each  face,  which  could  subvert  mnemonic  reliance 
on  visual  information  (for  example,  Ashby  &  Ell,  2001).  To  reinforce  reliance  on 
visual  information  per  se,  we  limited  rehearsal  by  giving  subjects  only  a  brief  glimpse 
of  each  face,  and  then  allowing  only  a  short  interval  between  successive  faces. 

To  preview,  our  first  two  experiments  establish  key  properties  of  visual  short-term 
memory  for  synthetic  faces.  Experiment  1  characterizes  the  effects  on  recognition  of 
serial  position,  list  length,  and  similarity  of  the  study  items  to  probe.  This  experiment 
also  explores  memory  decay  resulting  from  the  sequential  occurrence  of  events,  over 
and  above  the  influence  of  similarities  among  items.  Experiment  2  evaluates  the  effect 
of  category-membership  on  recognition  memory  for  synthetic  faces,  and  evaluates 
the  assumption  that  perceptual  categorical  effects  are  insignificant  or  absent.  In 
Experiment  3  we  employ  a  design  that  allowed  us  to  use  the  NEMo  model  of  visual 
short-term  recognition  memory  to  account  for  performance  on  individual  lists  (sets 
of  stimulus  items).  The  model  was  applied  in  several  different  alternative  modes,  for 
example  expressing  ’’similarity”  either  in  terms  of  faces’  physical  coordinates,  or  in 
terms  of  perceptual  coordinates,  assessed  using  multdimensional  scaling.  Guided  by 
NEMo,  Experiments  3  and  4  allowed  us  to  characterize  the  roles  of  (1)  the  similarity 
of  probe  to  the  list  of  study  items,  (2)  the  similarities  among  each  of  the  list  items, 
and  (3)  perceptual  noise  in  predicting  recognition  judgments  on  individual  lists. 


Experiment  1 

Our  first  experiment  assessed  the  general  suitability  of  multidimensional,  synthetic 
face  stimuli  for  use  in  studies  of  episodic  recognition  memory.  We  wanted  to  determine 
whether,  when  similar  methods  were  used,  findings  from  memory  studies  using  low 
dimensional  stimuli,  such  as  compound  gratings,  could  be  replicated  using  higher¬ 
dimensional  stimuli  whose  processing  is  known  to  engage  regions  of  the  brain  that 
are  not  notably  involved  in  processing  compound  gratings(for  example,  Kanwisher, 
McDermott,  &  Chun,  1997;  Druzgal  &;  D’Esposito,  2001,  2003) 

We  also  empirically  estimated  the  similarity-distance  function  for  synthetic  faces. 
This  tuning  function  describes  the  relationship  between  recognition  performance  and 
pairwise  Euclidean  distance  between  faces.  We  planned  to  apply  this  empirically 
estimated  similarity  tuning  function  in  modeling  subsequent  experiments;  the  aim 
was  to  reduce  the  number  of  free  parameters  in  the  model. 
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Stimuli  The  Wilson  faces  used  in  all  our  experiments  were  derived  from  pho¬ 
tographs  of  three  Caucasian  females  whom  we  designate  A,  B,  and  C.  From  m^, 
the  vector  of  37  measurements  taken  on  actual  face  A,  Wilson’s  procedure  synthe¬ 
sizes  a  realistic  version  of  that  face  in  a  stimulus  space  of  high  dimensionality  (n  =  37). 
Vectors  of  measurements  taken  on  faces  A,  B  and  C  were  transformed  so  as  to  be  mu¬ 
tually  orthogonal,  by  Gram-Schmidt  orthogonalization  (Diamantaras  h  Kung,  1996; 
Principle,  Euliano,  &  Lefebvre,  2000).  Consequently,  variation  in  one  face’s  geomet¬ 
ric  properties  is  independent  of  the  variation  in  geometric  properties  of  the  other  two 
faces  (Wilson  et  al.,  2002). 

After  pre-processing  and  normalization,  vectors  of  measurements  from  several 
different  faces  can  be  combined  to  generate  ma„5,  the  vector  of  measurements  for 
an  average  face.  To  illustrate,  the  synthesized  mean  female  face  is  shown  at  the 
left  of  Figure  1.  This  mean  face  is  derived  from  a  sample  of  40  Caucasian  female 
faces.  Summing  A;(mou9)  and  (1  —  A:)(m>i),  for  some  k,  0  <  k  <  1,  generates  a  face 
that  is  a  mixture  of  the  mean  face  and  some  individual  face,  A.  Allowing  k  to  vary, 
k  =  0 ...  1,  generates  a  graded  series  of  faces,  which  spans  a  continuum  from  the 
mean  synthesized  face  (when  k  =  1)  to  a  synthesized  version  of  face  A  alone  (when 
k  =  0).  The  same  operation  also  can  generate  a  graded  series  of  faces,  which  span 
the  distance  from  the  mean  face  toward  any  other  face,  here,  toward  B  or  C.  The 
relatively  small  pairwise  distances  between  faces,  together  with  the  orthogonalization 
of  the  face  space,  make  it  convenient  to  manipulate  the  similarity  of  one  stimulus  to 
another,  a  variable  known  to  be  important  in  recognition  memory. 

The  graded  series  for  faces  A-C  are  shown  in  the  upper  three  rows  of  Figure  1. 
Within  each  row,  (1  —  k)  ranges  from  0.04  to  0.20,  in  increments  of  0.04,  which  means 
that  each  face  differs  from  its  nearest  neighbor  by  approximately  the  mean  discrimi¬ 
nation  threshold  taken  under  viewing  conditions  similar  to  the  ones  we  used  (Wilson 
et  al.,  2002).  In  geometric  terms,  faces  A-C  lie  along  the  mutually-perpendicular  axes 
of  a  3D  space,  with  the  mean  face  at  the  origin. 

A  final  set  of  faces,  D,  was  generated  by  averaging  corresponding  exemplars  of  A, 
B,  and  C.  The  resulting  faces  are  shown  in  the  bottom  row  of  Figure  1.  Geometrically, 
the  faces  in  row  D  lie  along  the  diagonal  of  face  space,  which  means  that  the  geometric 
properties  of  the  faces  in  D  are  equally  well  correlated  with  the  geometric  properties 
of  each  of  the  other  faces,  A,  B,  and  C.  Figure  2A  shows  the  geometric  arrangement 
of  all  21  synthetic  faces  in  a  space  of  three-orthogonal  dimensions. 

To  prevent  contamination  of  face  recognition  by  emotional  cues,  our  Wilson-faces 
incorporated  standard  generic  shapes  for  features  that  would  change  shape  as  emo¬ 
tions  are  expressed,  e.g.,  eyes,  mouth  and  brows  (Ekman  h  Friesen,  1975).  In  addition 
to  having  a  consistent  neutral  expression,  these  faces  were  equated  on  dimensions  such 
as  contrast  and  mean  luminance,  which  eliminates  such  attributes  as  aids  to  memory. 

Procedure  On  each  trial,  one  to  four  faces,  the  study  series ,  were  followed  by 
a  single  probe  face  (p).  The  faces  in  the  study  series  comprised  the  items  to  be 
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Figure  1.  Face  stimuli  used  in  Experiment  1.  Constructed  after  the  method  of  Wilson 
et  al.  (2002),  the  stimulus  labeled  ’mean’  is  the  average  of  40  female  faces.  In  the  matrix  of 
faces,  rows  A-C  shows  faces  derived  from  three  different  faces;  row  D  shows  faces  that  are 
the  means  of  faces  A-C.  Over  the  matrix  columns  1, . . . ,  5,  faces  deviate  increasingly  from 
the  mean,  by  0.04,  in  column  1,  through  0.20,  in  column  5.  For  additional  details,  see  the 
text. 


remembered  for  that  trial.  Subjects  judged  whether  p  had  been  among  the  items 
in  the  study  series.  We  use  the  term  target  to  designate  a  p  that  had  been  in  the 
study  series,  and  the  term  lure  to  designate  a  p  that  had  not  been  in  the  study  series. 
Correspondingly,  we  can  designate  any  trial  as  either  a  target  trial  or  a  lure  trial. 
Because  the  study  series  varied  from  trial  to  trial,  subjects  were  forced  to  base  each 
“yes”- “no”  recognition  judgment  on  the  items  they  had  just  seen. 

Each  study  face  was  presented  for  110  msec,  with  an  inter-stimulus  interval  of  200 
msec.  The  use  of  brief  presentations  was  inspired  by  Wilson  et  al.  (2002) ’s  use  of 
this  same  duration  in  their  studies  of  face  discrimination,  and  by  the  fact  that  fairly 
detailed  processing  of  a  face  can  be  completed  within  the  first  100  msec  of  viewing 
Lehky  (2000). 
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Figure  2.  Representations  of  face  stimuli  in  alternative  three-dimensional  spaces.  In  each 
panel,  the  star  indicates  the  location  of  the  mean  face;  squares,  diamonds,  circles,  and 
triangles  represent  face  A,  B,  C  and  D,  respectively.  The  numbers  1-5  designate  faces’ 
distance  from  the  mean;  the  numbers  correspond  to  the  columns  in  Figure  .  Panel  A. 
The  21  face  stimuli  arranged  according  to  the  Euclidean  distances  between  faces’  physical 
descriptions,  that  is  the  physical  distance  of  each  face  from  the  mean.  Panel  B.  Arrangement 
of  the  stimulus  faces  in  a  three-dimensional  perceptual  space,  using  the  MDS  solution  to 
position  each  face  (Expt.  3).  The  MDS  space  shown  here  has  been  Procrustes  transformed 
to  bring  its  dimensions  into  line  with  those  of  the  space  shown  in  Panel  A. 


A  warning  tone  followed  the  study  series.  Then,  1200  msec  later,  a  p  face  was 
presented  for  110  msec.  To  prevent  potential  within-category  interference,  in  com¬ 
posing  lists  of  study  items  only  one  study  face  could  come  from  any  one  category 
of  face,  A. . .  D.  For  each  study  list,  p  was  chosen  at  random  from  the  entire  set  of 
faces,  with  two  constraints.  First,  on  half  of  all  trials,  p  was  forced  to  replicate  one 
of  the  items  in  the  study  set  (on  half  of  all  trials,  p  differed  from  all  study  items). 
Second,  when  p  had  been  among  the  study  items,  with  equal  frequency  it  matched 
each  of  those  items.  Prior  to  the  experiment  a  pool  of  7477  unique  stimulus  series 
were  randomly  generated.  Each  subject  was  tested  with  a  unique  sample  of  series 
drawn  randomly  from  this  large  pool  of  series,  subject  to  constraints  of  list  length  and 
equal  proportions  of  test  and  lure  trials.  Distinctive  tones  following  each  response 
gave  subjects  trial-wise  knowledge  of  results. 

Although  there  were  small  differences  in  size  from  face  to  face,  each  was  ap¬ 
proximately  5.5  degrees  high  by  3.8  degrees  wide.  To  eliminate  the  usefulness  of 
vernier-type  cues,  the  vertical  and  horizontal  position  of  each  face  was  perturbed  by 
adding  a  pair  of  random  displacements  drawn  from  a  uniform  distribution  with  mean 
=  12.6  minarc,  and  range  =  2.1  -  23.1  minarc. 

Subjects  served  in  three,  one-hour  sessions  of  576  trials  each.  During  testing,  a 
subject  sat  with  head  supported  by  a  chin-and-forehead  rest,  viewing  the  computer 
display  binocularly  from  a  distance  of  114  cm.  Trials  were  self-initiated. 


ROBERT  SEKULER,  PRINCIPAL  INVESTIGATOR 


9 


Subjects  Subjects  were  five  male  and  five  female  volunteers  whose  ages  ranged 
from  19  to  30  years.  They  had  normal  or  corrected-to-normal  visual  acuity  as  mea¬ 
sured  with  Snellen  targets,  and  normal  contrast  sensitivity  as  measured  with  Pelli- 
Robson  charts  (Pelli,  Robson,  &  Wilkins,  1988).  Subjects  were  naive  with  respect  to 
the  study’s  purposes. 

Apparatus  Stimuli  were  generated  and  displayed  using  Matlab  and  extensions  from 
the  Psychophysics  and  Video  Toolboxes  (Brainard,  1997;  Pelli,  1997).  Stimuli  were 
presented  on  a  15-inch  computer  monitor  with  a  refresh  rate  of  95  Hz,  and  a  resolution 
set  to  800  by  600  pixels.  Routines  from  the  Video  Toolbox  calibrated  and  linearized 
the  display.  Mean  screen  luminance  was  fixed  at  36  cd/m2. 

Results  Recognition  memory  performance  is  known  to  be  influenced  by  the  length 
of  the  study  series  and,  on  target  trials,  by  the  serial  position  of  the  study  item  that 
matches  p  (Sternberg,  1975;  Murdock,  1982).  We  therefore  examined  performance  as 
a  joint  function  of  these  two  variables.  As  study  series  lengthened,  overall  performance 
declined,  F(3, 27)  =  49.57,  p  <  0.001.  The  proportion  correct  responses  did  not  differ 
significantly  between  the  two  types  of  trials,  targets  and  lures,  F(l,  9)  =  5.02,  p  >  .05. 

For  target  trials,  as  list  length  varies,  the  effect  of  serial  position  is  best  understood 
by  examining  performance  not  as  a  function  serial  position  per  se,  but  as  a  function 
of  lag,  the  number  of  study  items  between  p  and  the  study  item  that  it  matched  (for 
example,  Kahana  &  Sekuler,  2002).  The  data  at  the  left  side  of  Figure  3  show  subjects’ 
mean  recognition  performance  as  a  function  of  lag  for  each  study  list  length,  LL=1 
-  4.  Note  that  lag=0  signifies  that  p  matched  the  last  study  item  in  the  series,  that 
is,  no  study  items  intervened  between  the  two.  (Data  at  the  right  side  of  that  figure 
represent  performance  on  lure  trials.)  Proportion  correct  on  target  trials  varies  with 
lag  position,  with  highest  values  achieved  for  lag=0,  when  the  matching  study  item 
appeared  at  the  very  end  of  the  series.  With  one  or  more  study  items  intervening, 
performance  declined  (lag  =  1,  2  or  3).  This  result  constitutes  a  recency  effect. 
Note  that  none  of  the  series  lengths  showed  a  primacy  effect,  that  is,  an  upswing 
in  performance  for  the  study  item  presented  first.  This  serial  position  effect  was 
consistent  with  the  one  reported  by  Kerr,  Ward,  and  Avons  (1999)  using  gray  scale 
faces.  We  conjecture  that  the  absence  of  any  primacy  effect  resulted  from  the  rapid 
presentation  of  stimuli,  which  inhibited  rehearsal  and  therefore  the  appearance  of  a 
primacy  effect  (Ward,  2002). 

Discussion  For  both  target  and  lure  trials,  the  proportion  of  correct  responses  de¬ 
clines  as  study  series  increase  in  length.  This  list  length  effect  may  reflect  that  as  study 
lists  lengthen,  the  recency  effect  becomes  diluted,  as  more,  less  well-remembered,  pre¬ 
ceding  items  are  averaged  into  the  mix.  The  striking  coincidence  of  the  four  curves 
at  the  left  side  of  Figure  3  suggests  that  when  target  trials  are  equated  for  recency, 
there  is  little  or  no  residual  list  length  effect.  This  outcome  is  consistent  with  results 
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Figure  3.  Recognition  performance  for  target  (left  side  of  graph)  and  lure  trials  (right  side). 
For  target  trials,  proportion  correct  recognition  is  shown  as  a  function  of  the  lag  between 
p  and  the  study  item  that  matched  p.  Curves  are  shown  for  study  series  1,  2,  3  and  4 
items  long,  with  dot  size  corresponding  to  list  length  (larger  dot  for  longer  study  lists). 
The  four  dots  at  the  right  side  of  the  graph  show  proportion  correct  for  lure  trials.  Error 
bars  represent  ±1  standard  error  of  the  mean  calculated  according  to  method  of  Loftus  and 
Masson  (1994). 
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on  recognition  memory  for  series  of  1-4  low-dimensional,  compound  gratings  (Kahana 
k  Sekuler,  2002). 

Figure  3  shows  that  proportion  of  correct  recognitions  on  lure  trials  fell  with 
increasing  numbers  of  faces  in  the  study  set.  Before  attributing  this  fall  to  some 
memory- related  process,  we  must  consider  an  alternative  possibility.  Because  our 
study  lists  were  generated  at  random,  and  because  any  face  could  appear  just  once 
in  a  study  series,  as  series  length  grew,  so  too  did  the  probability  that  one  of  the 
study  faces  on  lure  trials  would  be  similar  to  p.  If  such  similarity  promoted  false 
alarms  (saying  ”yes”  when  no  study  item  actually  matched  p),  then  that  alone  could 
generate  an  effect  of  series  length,  even  in  the  absence  of  any  degradation  in  memory 
with  series  length. 

To  examine  this  possibility,  we  reanalyzed  those  lure  trials  on  which  the  study 
series  had  just  one  face.  We  reasoned  that  these  one-item  series  would  be  least  affected 
by  any  degradation  in  memory.  These  results  were  then  fed  into  a  model  we  describe 
as  a  Perfect  Rememberer,  which  was  then  used  to  predict  performance  on  longer  study 
series,  assuming  that  memory  was  perfect,  e.g.,  unaffected  by  the  number  of  faces  in 
the  study  series.  This  simple  model  allows  us  to  evaluate  similarity’s  first-order  effects 
on  recognition  memory  performance.  In  addition,  the  Perfect  Rememberer  will  set  the 
stage  for  more  detailed  quantitative  modeling,  using  NEMo  a  model  to  be  presented 
later.  NEMo  will  be  introduced  in  conjunction  with  Experiments  3  and  4,  which 
provided  sufficient  data  to  support  a  model-based  analysis  of  individual  lists  of  study 
and  p  items. 

For  treatment  by  the  Perfect  Rememberer  we  classified  all  one-item  lure  trials 
according  to  the  distance  between  p  and  the  study  face.  For  this  purpose,  “distance” 
was  taken  as  the  Euclidean  distance  between  a  pair  stimuli,  i.e.,  the  distances  between 
faces  in  Figure  2A.  We  sorted  the  trials  into  bins  according  to  the  distance  between 
the  study  face  and  p.  Ten  bins  were  generated  with  equal  numbers  of  lure  trials  in 
each.  All  target  trials  were  put  into  one  bin  for  which  the  distance  between  p  and  the 
study  face  was  zero.  Finally,  for  each  bin  of  trials  we  calculated  the  mean  proportion 
of  trials  on  which  subjects  recognized  an  item  as  a  target.  These  means,  which  are 
plotted  in  Figure  4A,  were  used  to  predict  what  a  hypothetical  Perfect  Rememberer 
would  do  when  given  series  2,  3  or  4  faces  long.1 

The  Perfect  Rememberer’s  expected  responses  are  based  on  two  simple  assump¬ 
tions.  First,  we  assume  that  increasing  the  number  of  study  items  from  the  baseline 
of  one  has  no  effect  whatever  on  memory,  that  is,  there  is  perfect  memory  of  each 
study  face  seen  on  a  trial,  with  memory  for  any  one  study  face  being  completely 
undisturbed  by  the  presentation  of  other  faces.  Second,  we  assume  that  responses 
are  determined  solely  by  the  one  study  face  that  is  most  similar  to  p,  as  defined  in 

because  of  the  pre-probe  delay  in  all  study  series,  including  series  with  just  one  face,  we  cannot 
say  with  confidence  that  the  data  for  that  shortest  study  series  were  entirely  unaffected  by  decay  of 
memory  during  that  1200  msec  pre-probe  delay.  However,  any  such  effect  could  not  account  for  the 
changes  in  performance  with  the  distance  between  p  and  study  face,  as  shown  in  Figure  4A. 
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the  space  of  2A.  With  these  assumptions,  the  proportion  correct  on  lure  trials  can  be 
obtained  from  the  best-fitting  curve  shown  in  Figure  4A. 

As  this  figure  shows,  the  Perfect  Rememberer  is  well  described  by  an  exponential 
function.  This  best-fitting  exponential  was 

Pr(”yes”)  =  ae-Td{p’sr  (1) 

where  ( p ,  s )  is  the  Euclidean  distance  between  the  probe  and  the  stimulus  face.  The 
value  of  c  was  fixed  at  one,  which  produced  a  simple  exponential  function.  The 
best  fitting  value  of  a,  the  y-intercept,  was  0.84,  and  the  value  for  r,  the  rate  at 
which  the  exponential  decreased,  was  9.2.  These  empirically-derived  parameter  values 
closely  resembled  the  values  estimated  in  (Kahana  Sz  Sekuler,  2002) ’s  application  of 
NEMo  to  recognition  results  with  simpler,  compound  grating  stimuli.  As  the  Perfect 
Rememberer  depends  upon  the  relationship  between  the  physical  distance  and  the 
corresponding  behavioral  outcome,  this  function  comprises  as  a  distance-similarity 
function.  The  parameters  of  the  best-fitting  exponential  distance-similarity  function 
for  the  Perfect  Remember  would  be  used  later  in  model  simulations  of  results  from 
Experiment  3.  The  importation  of  this  empirically-derived  function  was  meant  to 
reduce  the  number  of  free  parameters  in  our  modeling. 

The  gray  bars  in  Figure  4B  show  the  Perfect  Rememberer’s  expected  mean  per¬ 
formance  on  lure  trials  for  study  series  of  length  2,  3  and  4.  As  expected,  these 
values  declined  with  series  length,  despite  the  fact  they  were  came  from  a  model  that 
was  memory-free.  Note,  however,  that  this  memory-free  decline  differed  substantially 
from  the  actual  results  of  lure  trials  in  Experiment  1.  These  latter  values  are  shown 
by  the  black  bars  in  the  figure.  Thus,  we  conclude  that  a  substantial  portion  of  the 
series  length  effect  was  caused,  not  by  the  similarity  alone,  but  by  some  process  that 
is  memory-dependent. 

The  effect  of  series  length  obtained  here  with  synthetic  faces  accords  well  with 
results  from  an  analogous  experiment  on  episodic  recognition  of  lower-dimensional 
stimuli,  compound  sinusoidal  gratings  (Kahana  &  Sekuler,  2002).  Specifically,  with 
both  kinds  of  stimuli,  mean  recognition  performance  declines  with  series  length,  there 
is  no  clear  sign  of  retroactive  interference  effects,  and  a  strong  recency  effect  is  seen, 
but  no  primacy  effect. 

Experiment  2 

Experiment  1  demonstrated  that  one  face’s  similarity  to  another  influences 
episodic  recognition  memory.  However,  the  precise  nature  similarity’s  role  in  memory 
remains  to  be  clarified.  When  sets  of  study  items  were  composed  for  Experiment  1, 
no  more  than  one  face  was  allowed  to  come  from  each  category  of  face,  A  ...  D.  As  a 
result,  study  items  differed  from  one  another  in  two  ways:  the  categories  from  which 
they  were  drawn,  and  each  face’s  distance  from  the  mean  face.  Scrutiny  of  items  that 
lie  in  the  same  row  of  Figure  1  suggests  that  for  many  faces,  one  can  identify  the 
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Figure  4 ■  Panel  A.  Proportion  correct  recognition  as  a  function  of  the  difference  between 
p  and  the  study  item  in  face  space.  Data  are  for  study  series  of  length  1.  Error  bars 
represent  ±1  standard  error  of  the  mean  of  the  Probe-Study  item  distance  were  smaller 
than  the  width  of  dots.  Panel  B.  Predicted  and  obtained  proportion  correct  recognition 
responses  on  lure  trials  for  study  series  of  length  2,  3  and  4.  Predicted  values  assume 
perfect  memory  and  judgments  made  on  the  basis  of  p’s  proximity  to  that  study  item  to 
which  p  is  nearest,  in  face  space. 


characteristics  each  shares  with  others  in  its  row.  These  shared  characteristics  permit 
the  faces  to  be  categorized. 

Various  studies  have  demonstrated  that  categorization  can  affect  perception. 
When  subjects  learn  to  categorize  stimuli,  physical  differences  between  categories 
gain  importance  relative  to  any  within-category  differences  (for  example,  Goldstone, 
1994,  1998;  Levin  &  Beale,  2000).  This  is  called  the  categorical  perception  effect, 
and  is  usually  assessed  under  conditions  in  which  subjects  are  allowed  or  encouraged 
to  encode  items  together  with  the  categorical  attributes  of  those  items.  To  minimize 
category  effects  in  the  present  experiments,  we  kept  stimulus  duration  and  ISI  brief. 
The  model  we  intended  to  test  later,  NEMo,  is  mute  about  categorization,  making  no 
provision  for  such  effects.  If  differences  in  category  membership  did  affect  recognition 
memory  performance,  NEMo  would  have  to  be  significantly  revised.  Therefore,  it 
was  important  to  determine  whether  category  membership  played  some  role  in  our 
short-term  visual  memory  paradigm. 

In  Experiment  2,  we  aimed  to  investigate  the  link  between  categorization  and 
visual  short-term  memory.  We  compared  memory  for  three-item  long  study  series, 
which  were  generated  according  to  two  different  rules.  One  rule  forced  all  three  study 
faces  to  come  from  different  categories  of  faces,  A  . . .  D;  the  other  rule  allowed  two  or 
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even  three  study  faces  to  come  from  the  same  category.  We  use  the  terms  ’’between- 
category”  and  ”  within-category”  to  describe  the  series  resulting  from  application  of 
the  two  rules.  Note  that  between-category  series  are  formed  according  to  the  same 
rule  that  generated  series  in  Experiment  1. 

Methods 

Apparatus  and  Stimuli  The  stimuli  and  apparatus  were  as  in  Experiment 
1  except  that  multiple  subjects  were  tested  simultaneously,  using  computers  in  a 
classroom  cluster.  Although  subjects  did  not  use  chin  rests,  they  were  encouraged  to 
maintain  a  constant  viewing  distance  of  approximately  57  cm  from  their  computer. 

Subjects  Twenty  nine  Brandeis  undergraduates  participated  as  part  of  a  course 
requirement;  during  a  session,  after  several  trials  of  practice,  each  subject  gave  436 
trials.  All  subjects  were  naive  to  the  experimental  purpose,  and  none  had  taken  part 
in  Experiment  1. 

Procedure  The  procedure  was  the  same  as  Experiment  1  except  that  (a)  study 
series  always  comprised  three  faces,  and  (b)  study  items  and  probes  were  presented 
for  250  msec  each.  During  a  testing  session,  between-category  and  within-category 
series  were  randomly  intermixed,  and  occurred  with  equal  frequency. 

Results  The  gray  bars  in  Figure  5A  show  the  proportion  correct  on  trials  with 
within-category  study  series  on  target  trials.  Data  are  separated  according  to  the 
serial  position  of  the  study  face  that  matched  p.  The  black  bars  in  Figure  5A  show 
results  with  between-category  study  series  on  target  trials.  Proportion  correct  re¬ 
sponses  were  highest  at  the  last  probe  position  (F( 2, 56)  =  31.6,  p  <  .01),  replicating 
Experiment  1  (compare  with  Figure  3).  A  repeated  measures  ANOVA  showed  that 
the  proportion  of  correct  responses  was  higher  for  within-category  series  than  for 
between-category  series,  F(l,  28)  =  16.5,  p  <  .01.  This  effect  did  not  vary  signifi¬ 
cantly  with  p  position,  F( 2, 56)  =  .054,  p  =  .95. 

We  suspected  that  the  differences  between  the  two  conditions  shown  in  Figure  5A 
might  have  overestimated  the  real  differences  between  those  conditions.  On  average, 
a  study  set  whose  members  came  from  different  face  categories  would  tend  to  have 
members  more  different  from  one  another,  in  Euclidean  space,  than  would  the  average 
study  set  whose  members  all  came  from  a  single  category  of  faces.  Because  recognition 
accuracy  is  influenced  by  stimulus  similarity  (Figure  4)  we  reanalyzed  the  data  from 
Experiment  2,  equating  between-  and  within-category  results  for  the  similarity  of 
study  series  members  to  one  another. 

For  each  stimulus  series,  we  used  the  Euclidean  distance  between  pairs  of  faces 
to  calculate  that  series’  mean  pairwise  distance  among  its  study  faces.  We  then  or¬ 
dered  all  stimulus  series  for  each  condition,  between-category  and  within-category, 
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Figure  5.  Panel  A.  Proportion  correct  recognition  responses  on  target  trials  for  probes 
matching  study  items  in  serial  positions  1-3.  Gray  bars  represent  study  series  whose  mem¬ 
bers  were  in  the  same  category  of  face;  black  bars  represent  series  whose  members  came 
from  different  categories.  Arrows  show  proportion  correct  on  lure  trials.  Panel  B.  Results 
after  correction  for  summed  similarity.  In  both  panels,  error  bars  represent  ±1  standard 
error  of  the  mean,  corrected  for  within  subject  variability  (See  Loftus  and  Masson  (1994) 
for  details 


according  to  the  series’  mean  pairwise  distance.  Next,  from  the  between-category 
data  we  discarded  all  series  whose  mean  pairwise  distance  exceeded  the  largest  mean 
pairwise  distance  in  the  within-category  series,  and  obtained  equivalent  mean  pair¬ 
wise  distance  for  both  conditions.  The  between-category  trials  that  passed  this  test 
were  used  to  recalculate  the  proportion  correct  responses  shown  for  that  condition 
in  Figure  5B.  Note  this  correction  for  similarity  within  a  study  series  reduced  the 
effect  of  category  membership,  and  the  difference  between  conditions  were  no  longer 
statistically  significant,  F(l,28)  =  2.42,  p  >  0.10. 


Discussion  It  appears  that  after  the  pairwise  similarity  of  faces  has  been  taken  into 
account,  the  category  to  which  the  faces  belonged  exert  a  negligible  effect  on  episodic 
recognition  memory.  Of  course,  it  is  impossible  to  deny  that  category  membership  can 
be  consequential  for  various  perceptual  and  memory  tasks,  under  conditions  different 
from  our  own,  e.g.,  conditions  of  longer  stimulus  exposures,  longer  inter-stimulus 
intervals,  or  larger  differences  between  faces.  But  our  conditions  clearly  succeeded 
in  minimizing  effects  of  category  membership.  As  a  result  of  the  null  result,  our 
subsequent  experiments  and  modeling  did  not  assign  any  special  weight  to  category 
membership  when  similarity-related  variables  were  being  calculated. 
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Experiment  3 

Experiment  1  suggested  that  the  basic  features  of  recognition  memory  for  a  series 
of  briefly- presented  synthetic  faces  may  not  be  substantially  different  from  memory 
for  other,  simpler  stimuli  tested  under  comparable  conditions.  For  a  deeper  under¬ 
standing  of  memory  and  other  processes  at  work  in  this  result,  we  exploited  a  summed 
similarity  model  that  has  successfully  accounted  for  recognition  memory  with  com¬ 
pound  sinusoidal  gratings.  To  maximize  the  power  of  the  modeling  we  needed  reli¬ 
able  empirical  performance  measures  for  each  individual  stimulus  series  that  would  be 
modeled.  Such  measures,  in  our  memory  paradigm,  require  ~30  replications  per  series 
for  each  subject  (Kahana  &  Sekuler,  2002).  To  accommodate  this  many  replications 
within  a  reasonable  number  of  trials,  we  culled  the  stimulus  series  from  Experiment  1, 
retaining  only  60  series,  each  with  three  study  faces,  for  use  in  Experiment  3.  These 
60  series  spanned  the  full  range  of  recognition  performance. 

Perceptual  similarity  among  stimuli  is  central  to  the  computational  model  we  in¬ 
tended  to  apply  to  face  memory.  With  compound  gratings  as  stimuli,  similarity  can 
be  defined  by  scaling  stimuli  in  terms  of  subjects’  difference  thresholds  for  spatial 
frequency  (Zhou,  Kahana,  &  Sekuler,  2004).  That  same  approach  would  be  burden¬ 
some  with  face  stimuli  because  the  difference  threshold  varies  substantially  from  one 
part  of  the  space  to  another,  with  smallest  difference  thresholds  in  the  neighborhood 
of  the  mean  face  (Wilson  et  al.,  2002).  Therefore,  in  addition  to  the  native  metric 
representations  of  synthetic  faces,  as  represented  in  Figure  2A,  we  also  used  non¬ 
metric  multidimensional  scaling  (MDS)  to  characterize  the  perceptual  space  within 
which  our  face  stimuli  were  located,  and  to  quantify  the  distances  between  faces  in 
that  space.  The  data  on  which  MDS  was  based  came  from  oddity  judgments  made 
on  simultaneously-presented  trios  of  faces.  To  assess  subjects’  viewing  behavior  and 
strategy  while  viewing  these  faces,  we  followed  up  the  main  experiment  by  measuring 
the  fixation  behaviors  of  several  subjects  while  they  made  oddity  judgments  like  those 
made  in  the  main  experiment. 

Subjects  Two  male  and  six  female  volunteers  aged  from  20  to  25  years  participated 
in  the  main  experiment,  and  two  female  volunteers  aged  from  20  to  22  years  partic¬ 
ipated  in  the  follow-up,  eye  tracking  experiment.  None  took  part  in  Experiment  1 
or  2,  and  all  were  were  naive  to  the  purpose  of  this  experiment.  They  had  normal 
or  corrected-to-normal  visual  acuity  as  measured  with  Snellen  targets,  and  normal 
contrast  sensitivity  as  measured  with  Pelli-Robson  charts. 

Procedure 

Recognition  Memory  Of  60  different  stimulus  series,  half  were  target  trials, 
in  which  p  replicated  one  of  three  study  items;  the  remaining  lists  were  lure  trials,  in 
which  p  replicated  none  of  the  study  items.  The  timing  of  stimulus  presentation  was 
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identical  with  that  of  Experiment  1.  Subjects  participated  in  four  one- hour  sessions, 
490  trials  in  each.  The  first  10  trials  of  each  session  were  treated  as  practice  and  were 
eliminated  from  the  data  analysis;  this  retained  32  replications  for  each  series  and 
subject. 

Multidimensional  Scaling  We  used  non-metric  multidimensional  scaling 
(MDS)  to  assess  the  perceptual  similarity  structure  of  the  synthetic  faces  (Lee,  Byatt, 
&  Rhodes,  2000).  The  data  required  for  such  scaling  were  generated  using  the  trian¬ 
gular  method  of  trials  (Ennis,  Mullen,  Frijters,  &  Tindall,  1989)  On  each  trial,  three 
faces  were  presented  side  by  side,  simultaneously  for  500  ms,  and  subjects  chose  the 
one  that  seemed  most  different  from  the  other  two  (Romney,  Brewer,  Sz  Batchelder, 
1993;  Wexler  &  Romney,  1972).  To  minimize  the  possibility  that  vernier- type  cues 
might  contribute  to  the  dissimilarity  judgments,  the  vertical  position  of  each  was 
randomly  offset  by  a  sample  from  a  uniform  distribution  spanning  ±  16.8  min.  Each 
possible  stimulus  pair  (Face i,  Face 2)  is  presented  with  every  other  stimulus  (Faces). 
If  stimulus  Faces  is  selected  as  the  stimulus  most  different  from  the  others,  then 
the  remaining  stimuli,  Face  1  and  Face 2,  are  deemed  similar,  either  explicitly  or  by 
default.  A  similarity  matrix  is  constructed  by  counting  the  number  of  times  that  a 
stimulus  pair  (e.g.,  Face  1,  Faces)  is  designated  as  ’’similar”  when  placed  in  combi¬ 
nation  with  various  other  stimuli  (e.g.,  Faces  ■  ■  ■  Face 21). 

To  control  the  number  of  trials  required  for  the  multidimensional  scaling,  we  used 
a  Balanced  Incomplete  Block  design  (Weller  &  Romney,  1988).  For  this  design  we 
generated  triads  of  faces  (’’blocks”),  whose  members  were  drawn  from  the  complete 
set  of  21  faces.  This  selection  was  constrained  so  that  each  of  the  210  pairs  of  faces 
occurred  in  the  context  of  30  triads.  This  arrangement  meant  that  the  30  trials  whose 
triads  included  any  particular  pair  of  faces  were  likely  to  have  different  faces  as  their 
third  member.  The  displacement  of  three  faces  were  randomly  determined  for  each 
trial. 

Subjects  participated  in  three  one-hour  sessions,  each  with  710  trials.  The  first 
10  trials  of  each  session  were  treated  as  practice  and  were  eliminated  from  our  data 
analysis.  The  remaining  2100  triadic  comparisons  per  subject  were  converted  into  dis¬ 
similarity  matrices,  which  were  processed  by  SPSS’  ALSCAL  and  INDSCAL  routines. 
All  runs  used  an  Euclidean  distance  model.  A  small,  supplementary  experiment  was 
done  to  examine  subjects’  fixation  patterns  while  they  viewed  a  limited  number  of 
stimulus  triads,  and  attempted  to  identify  the  face  that  was  most  different  from  the 
other  two. 

Apparatus  and  Stimuli  In  the  main  experiment,  apparatus  and  synthetic  face 
stimuli  were  the  same  as  in  Experiment  1.  In  a  supplementary  study  of  subjects’ 
fixation  behavior,  stimuli  were  presented  on  a  19  inch  monitor  under  the  control  of 
Matlab,  and  viewed  with  57  cm  viewing  distance.  The  visual  angles  of  presented  faces 
were  same  as  in  Experiment  1.  The  monitor’s  refresh  rate  was  75  Hz;  display  mode 
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was  set  to  800  by  600  pixels.  Eye  position  was  recorded  by  an  ASL  Eye  Tracking 
system  (Model  504),  a  video  based  system  that  uses  the  pupil-corneal  reflection  to 
measure  eye  gaze  location.  Gaze  was  sampled  at  60  Hz,  and  recorded  to  a  spatial 
precision  of  just  under  0.5  degrees  visual  angle.  The  apparatus  monitored  only  the 
left  eye.  Prior  to  testing,  eye  position  was  calibrated  using  a  standard,  17-target 
fixation  pattern. 


1  2  3 

Probe  Position 


Figure  6.  Proportion  correct  recognition  responses  on  target  trials  for  probe  matching 
various  study  items.  Error  bars  represent  ilstandard  error  of  the  mean,  calculated  using 
the  correction  suggested  by  Loftus  and  Masson  (1994).  The  arrow  to  the  right  of  the  graph 
represents  the  mean  proportion  correct  on  lure  trials.  Each  star  shows  the  proportion 
correct  for  the  corresponding  condition  in  Experiment  2. 


Results 

Recognition  Memory  Figure  6  shows  the  mean  proportion  correct  as  a  func¬ 
tion  of  p’s  serial  position.  An  ANOVA  showed  significant  difference  among  those 
positions,  F( 2, 14)  =  24.43,  p  <  0.001,  and  an  a  priori  comparison  revealed  a  signifi¬ 
cant  recency  effect,  F(l,7)  =  32.98,  p  <  0.01.  For  comparison,  corresponding  results 
from  Experiment  2  with  the  same  series  are  shown  as  stars  in  the  figure. 
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Multidimensional  Scaling  MDS  solutions  were  obtained  in  spaces  varying 
from  1-6  dimensions,  r2  values,  which  represent  the  proportion  of  variance  accounted 
for  in  the  scaled  data,  increased  linearly  as  the  number  of  dimensions  varied  from  one 
to  three;  this  increase  saturated  thereafter.  The  three-dimensional  solution  generated 
by  MDS  is  shown  in  Figure  2B.  Each  symbol  represents  one  synthetic  face,  and  the 
pairwise  distances  between  symbols  represents  the  pairwise  perceptual  dissimilarity  of 
the  faces.  Based  on  the  values  of  r2,  we  used  three  dimensional  solutions  in  subsequent 
modeling.  With  the  three  dimensional  space,  Kruskal’s  Stress  measure  was  0.26,  r2 
was  0.62. 

Procrustes  analysis  (Dryden  &  Mardia,  1998)  linearly  transformed  the  matrix 
of  values  for  the  MDS  solution  to  bring  it  into  best  conformity  with  the  matrix 
of  pairwise  distances  in  the  faces’  physical  space.  The  outcome,  which  is  shown 
in  Figure  2B,  was  based  on  the  Euclidean  similarity  transformations  of  translation, 
reflection,  orthogonal  rotation,  and  isotropic  scaling  of  points  in  the  MDS  solution.  If 
the  perceptual  representation  of  these  synthetic  faces  were  identical  to  their  physical 
representation,  the  post-Procrustes  MDS  solution  would  be  perfectly  congruent  with 
the  representation  in  Figure  2A,  where  faces  A,  B,  and  C  are  orthogonal  to  each 
other,  and  exemplars  of  face  D  lie  on  the  diagonal. 

Clearly,  the  Procrustes  transformation  does  not  eliminate  all  residual  differences 
between  the  perceptual  space,  as  represented  by  MDS,  and  the  physical  space.  After 
the  Procrustes  transformation,  the  sum  of  squared  residual  discrepancies  between 
the  physical  representations  and  the  transformed-MDS  representations  was  0.26.  We 
used  Monte  Carlo  methods  to  cast  this  sum  of  squares  into  the  same  units  that  were 
used  for  the  Euclidean  physcial  space  (Figure  2A).  Procrustes  analyses  were  done  on 
matrices  of  the  faces’  physical  coordinates,  which  had  been  randomly  perturbed  to 
varying,  known  degrees,  by  the  addition  of  independent,  zero- mean,  Gaussian  random 
deviates  to  each  of  the  three  coordinates  for  each  face.  This  operation  was  carried  out 
1,000  times  for  a  number  of  Gaussian  distributions  with  different  standard  deviations. 
From  the  mean  residual  sums  of  squares  associated  with  each  standard  deviation 
value,  we  identified  the  random  perturbation  of  face  coordinates  that  produced  the 
same  residual  sum  of  squares  seen  with  the  Procrustes  transformation  of  the  MDS 
solution.  The  standard  deviation  of  the  residual  difference  between  the  MDS  solution 
and  the  original  physical  coordinates  was  equivalent  to  6-7%,  which  corresponds  to 
~1.5  times  of  the  separation  of  neighboring  faces  within  a  single  category  of  faces,  A 
...D. 

To  examine  the  residuals  on  a  finer  scale,  the  mean  residuals  between  the  MDS 
solution  and  the  faces’  physical  coordinates  were  calculated  and  then  sorted  into  bins 
according  to  the  distance  between  faces  in  a  study  series  and  the  mean  of  the  21  faces. 
These  values  are  plotted  in  Figure  7.  Note  that  the  magnitude  of  the  residuals  grew 
with  increasing  distance  from  the  mean  face,  suggesting  that  perceptual  and  physical 
representations  of  faces  were  most  discrepant  for  the  more  extreme  faces  in  our  set. 

As  a  further  comparison  between  the  MDS  and  physical  representations  of  our  21 
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faces,  we  computed  the  vector  angles  between  perceptual  exemplars  of  A,  B,  and  C. 
These  vector  angles  are  shown  in  Table  1.  Faces  just  4%  away  from  the  mean  face  were 
excluded  from  these  calculations  because  in  MDS  space  those  faces  clustered  tightly 
around  the  mean  face,  which  made  angle  measurements  for  those  faces  meaningless. 
The  mean  angles  based  on  the  8,  12,  and  16%  data  were  roughly  90°,  suggesting 
that  the  perceptual  similarity  space  preserved  much  of  the  orthogonality  that  had 
been  built  into  the  physical  representations  of  the  faces.  However,  all  of  the  angle 
estimates  dropped  appreciably  when  the  0.20  faces  were  included,  which  confirms  the 
demonstration  in  Figure  7  that  these  extreme  faces  deviate  most  from  Figure  2A’s 
space. 


Table  1:  Angles  Between  Perceptual  Representation  of  Faces  A-C 


Face  Distance 

AB  Angle 

AC  Angle 

BC  Angle 

8% 

125.7 

65.3 

116.1 

12% 

83.2 

74.2 

80.7 

16% 

57.4 

71.5 

83.4 

20% 

46.5 

54.9 

63.3 

Mean  all  distances 

78.2 

66.5 

85.9 

Mean  8%  to  16% 

88.8 

70.3 

93.4 

Model  NEMo,  the  model  we  applied  to  the  recognition  memory  data,  had  previously 
been  used  to  account  for  recognition  memory  with  simple,  low  dimensional  stimuli 
—  compound  gratings  (Kahana  &  Sekuler,  2002).  NEMo  departs  from  the  classic 
summed  similarity  models  of  item  recognition  (e.g.,  McKinley  &  Nosofsky,  1996; 
Nosofsky,  1986)  by  allowing  recognition  judgments  to  be  determined  not  only  by  the 
similarity  between  the  probe,  p,  on  one  hand,  and  each  study  stimulus,  on  the  other, 
but  also  by  similarities  among  study  items  themselves.  Given  a  series  of  L  study 
items,  Si...Sl,  and  a  probe,  p,  NEMo  responds  ”yes”  if: 

h  2  L-i  l 

Y  <*»?( P,  ^  +  e«)  +  777-— rrl  Y  Y  +  Ci’  *i  +  e 3)  >  CL  (2) 

i=l  '  i=l  j=i+ 1 

N - - - ''  V - v - ' 

Summed  Probe-Item  Similarity  Mean  Inter-stimulus  Similarity 

where  ?j(p,  Sj)  is  the  perceptual  similarity  between  p  and  the  ith  study  item  (see 
Equation  3,  below);  e  is  a  vector  representing  the  noise  associated  with  each  stimulus 
dimension,  a*  is  the  weight  given  the  ith  study  item,  and  Cl  represents  the  optimal 
criterion  for  a  series  of  L  study  items.  To  allow  for  the  possibility  that  subjects’ 
decision  rule  might  incorporate  inter-item  similarity,  NEMo  adds  together  (i)  summed 
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Figure  7.  Mean  distance  between  faces’  representations  in  physical  space  and  in  perceptual 
(MDS)  space,  as  a  function  of  distance  from  the  mean  face. 


similarity  and  (ii)  inter-stimulus  similarity,  weighting  the  latter  by  a  parameter  (3.  If 
(3  =  0  the  model  reduces  to  a  standard  summed  similarity  model  (Nosofsky,  1986) 
with  noisy  item  representations  (Ennis,  1988)  and  a  deterministic  decision  rule.  If 
(3  <  0,  a  given  lure  becomes  more  tempting  when  Si...Sl  are  widely  separated  from 
each  other.  Conversely,  if  (3  >  0,  a  lure  becomes  less  tempting,  and  fewer  ”yes” 
responses  are  made,  when  si. . .  s l  are  widely  separated  from  each  other. 

In  this  model,  the  similarity,  rj(si}Sj),  between  representations,  s*  and  s j,  is  given 
by: 


rj(si,sj)=ae~Td^)c  (3) 

where  d  is  the  weighted  distance  between  the  two  stimulus  vectors,  and  r,  c  and  a 
jointly  determine  the  form  of  the  generalization  gradient. 

We  defined  similarity  in  both  physical  and  perceptual  (MDS)  spaces,  and  than 
ran  two  parallel  simulations  of  NEMo  ,  one  with  each  definition  of  similarity.  Among 
parameters  in  Equation  3,  we  fixed  c  =  1,  which  implements  a  simple  exponential 
generalization  function.  Previously,  when  c  was  a  free  parameter,  its  estimated  values 
tended  to  be  very  close  to  1  (Kahana  &:  Sekuler,  2002).  To  reduce  NEMo’s  free 
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parameters  further,  we  used  an  independent,  empirical  estimate  of  r  and  a.  We 
estimated  similarity  in  physical  space  from  the  data  in  Figure  4A,  which  we  took  as 
an  empirical  approximation  to  the  similarity  tuning  function  in  physical  space.  After 
fitting  an  exponential  function  to  the  data  in  that  figure,  the  function’s  exponent  and 
y-intercept,  9.20  and  0.84,  were  used  for  r  and  a  respectively,  in  one  set  of  model 
simulations.  We  then  transformed  the  distances  along  the  x-axis  of  Figure  4A  using 
inter-face  distances  from  MDS,  and  fit  a  second  exponential  function,  this  time  to 
the  transformed  data.  This  second  function  represents  the  tuning  function  defined 
in  perceptual  space.  The  function’s  exponent  and  y-intercept,  11.14  and  0.91,  were 
used  as  r  and  a  respectively,  in  a  second  set  of  model  simulations.  Finally,  we  fixed 
one  other  parameter,  setting  NEMo’s  criterion  to  0.5,  which  is  the  rational,  unbiased 
criterion  that  lies  halfway  between  the  means  of  summed  similarity  values  generated 
on  target  and  on  lure  trials.  In  studies  of  recognition  memory  for  gratings  (Kahana 
&  Sekuler,  2002),  subjects’  mean  criterion  was  found  to  be  very  close  to  this  value. 

Simulations  and  Application  of  Model  We  fit  NEMo  to  subjects’  recognition 
accuracy  on  each  of  the  60  different  stimulus  lists  in  Experiment  3.  To  find  NEMo’s 
best  fitting  parameters  a  genetic  algorithm  (Mitchell,  1996)  minimized  the  root-mean 
squared-difference  (RMSD)  between  observed  and  predicted  recognition  scores.  In 
applying  the  genetic  algorithm,  a  population  of  1000  random  parameter  sets  was 
allowed  to  evolve  for  20  generations.  Each  parameter  set  was  allowed  to  run  for  1000 
simulated  trials  for  each  stimulus  list  to  produce  an  estimate  of  RMSD.  Then  each 
of  the  500  least-fit  parameter  sets  was  replaced  with  a  new  parameter  set  at  the  end 
of  every  generation  by  randomly  drawing  each  of  their  parameter  values  from  one  of 
the  500  best-fit  parameter  sets.  The  500  best-fit  parameter  sets  were  mutated  by  a 
single,  Gaussian  parameter  change  with  a  standard  deviation  of  30%  of  a  parameter’s 
range. 

We  performed  the  entire  model-fitting  procedure  twice.  In  the  first  application, 
inter-face  similarity  values,  the  value  of  r,  and  a  used  in  NEMo  were  defined  by 
the  physical  distances  between  faces,  that  is,  the  differences  among  the  parameter 
sets  used  to  generate  particular  exemplar  faces;  in  the  second  application,  inter-face 
similarity  values,  t,  and  a  were  defined  by  the  MDS  descriptions  taken  from  subjects’ 
perceptual  space. 

Results  of  Model  Simulations:  Fits  to  Mean  Data  We  first  fit  average 
performance  across  eight  subjects  by  using  similarity  in  physical  space  and  in  av¬ 
erage  MDS  perceptual  space.  Table  2  gives  the  best  fitting  model  parameters  for 
NEMo  derived  from  the  genetic  algorithm.  The  column  Physical  shows  the  best  pa¬ 
rameters  using  stimulus  distances  in  physical  space,  and  the  column  MDS  shows  the 
best  parameters  using  stimulus  distances  in  MDS,  perceptual  space.  The  first  three 
parameters,  a j,  a 2,  and  <t3,  are  the  variances  of  noise  distributions:  one  for  each 
dimension  of  the  three  dimensional,  perceptual  space,  a i,  (72,  and  ct3  correspond  to 
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dimension  1,  2  and  3  in  Figure  2.  The  next  two  parameters,  au  and  a'2,  represent 
forgetting  for  the  first  and  second  item  in  a  study  series,  respectively.  (For  the  last 
item  in  a  study  series,  «3  was  set  to  one.) 

As  explained  earlier,  /?  represents  the  importance  of  inter-item  similarities.  Its 
negative  sign,  /?  =  —0.53  and  —0.34  for  the  two  simulations,  indicates  that  when 
study  items  were  similar  to  one  another  the  model  increased  its  tendency  to  treat  a 
lure  as  a  target.  Note,  finally,  that  the  RMSD  associated  with  the  perception-based 
fit,  0.101,  is  somewhat  smaller  than  the  RMSD  associated  with  the  fit  based  on  the 
faces’  physical  representation,  0.123. 


with  Physical  Space  with  MDS  Perceptual  Space 


Figure  8.  Proportion  ”yes”  responses  plotted  against  predictions  from  NEMo.  Panel  A. 
Interstimulus  distances  used  in  NEMo  were  taken  from  physical  description  of  stimuli.  Panel 
B.  Interstimulus  distances  used  in  NEMo  were  taken  from  MDS  solution. 

Figure  8  shows  the  correlations  between  the  predicted  proportion  of  ”yes”  re¬ 
sponses  from  NEMo  and  the  observed  mean  proporion  of  ” yes”  responses.  The  predic¬ 
tions  were  made  with  similarity  defined  either  by  the  physical  representations  (Figure 
8A)  or  by  the  MDS  solution  averaged  across  all  subjects  (Figure  8B).  With  average 
data,  NEMo  produced  a  better  account  of  the  data  when  it  incorporated  perceptual 
similarity  among  faces.  Physical  and  perceptual  similarities  produced  r2  =  0.68  and 
0.78,  respectively. 

As  just  noted,  NEMo’s  predictions  tended  to  be  more  accurate  when  perceptual 
rather  than  just  physical  similarities  were  taken  account  of,  but  those  predictions  had 
a  number  of  clear  outliers.  These  were  stimulus  series  on  which  the  model  failed  badly, 
specifically  series  that  deviated  by  0.20  or  more  from  the  predicted  proportion  correct. 
To  examine  the  cause  of  these  failures,  we  scrutinized  the  makeup  of  these  series.  Of 
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the  five  outliers,  we  found  that  three  contained  two  or  more  faces  that  deviated 
from  the  mean  face  by  0.20.  To  this  outcome,  Monte  Carlo  simulation  assigned  a 
probability  0.02<  p  <0.01.  So,  faces  with  the  greatest  deformation  relative  to  the 
mean  face  produced  the  largest  errors  in  NEMo’s  predictions.  As  shown  in  Figure 
7  and  in  Table  1,  these  extreme  faces  deviated  appreciably  from  the  orthogonal, 
physical  representations  of  the  faces.  The  perceptual  transformations  associated  with 
the  ’’strangeness”  of  these  faces  could  be  at  fault.  We  suspect  that  ’’strangeness” 
essentially  introduced  an  extra  perceptual  dimension  specific  to  those  stimuli,  and 
because  this  extra  dimension  is  limited  to  a  small  subset  of  faces,  it  would  not  be 
fully  represented  in  the  MDS  solution. 

Results  of  Model  Simulations:  Fits  to  Individual  Subject  Data  Having 
fit  the  recognition  results  averaged  over  subjects,  we  fit  individual  performance  by 
using  physical  representations  of  faces  or  by  using  individual  MDS  solutions  for  each  of 
the  eight  subjects.  One  aim  here  was  to  obtain  confidence  limits  for  each  parameter. 
Table  3  gives  the  mean  and  the  standard  deviations  of  best  fitting  parameters  for 
each  subject.  Note  that  all  values  were  consistent  with  those  obtained  by  fitting 
the  mean  data.  Of  course,  the  RMSD  values  associated  with  individual  subject  fits 
were  somewhat  higher  than  the  RMSD  value  associated  with  fitting  the  mean  across 
subjects,  because  each  subject’s  results  would  be  based  on  relatively  few  trials  per 
study  series,  which  diminishes  the  reliability  of  the  empirical  data. 

Parameters  a\  and  a2  were  significantly  smaller  than  <23  for  both  simulations 
using  physical  space  coordinates:  F(  1, 7)  =  18.  6,  p  <  .01  for  ai,  F(  1,7)  =  73.3, 
p  <  .01  for  a2;  with  MDS  perceptual  space  coordinates,  the  comparable  values  were 
■F(l,  7)  =  15.3,  p  <  .01  for  au  and  F(l,  7)  =  61.0,  p  <  .01  for  a2.  These  inequalities, 
with  a  1  and  a2  <  0:3  represent  forgetting.  Additionally,  a  one-sample  t-test  showed 
that  the  mean  (3  values  were  significantly  less  than  zero  for  the  model  simulation  with 
physical  space  coordinates,  t{ 7)  =  —3.51,  p  =  .01,  and  for  the  model  simulation  with 
MDS  perceptual  space  coordinates,  t( 7)  =  -4.15,  p  <  .01,  which  confirms  inter-item 
similarity’s  important  contribution  to  recognition  judgments. 

To  assess  the  difference  between  NEMo’s  predictions  using  physical  descriptions 
and  predictions  using  MDS  descriptions  of  the  faces,  we  compared  two  sets  of  simula¬ 
tions:  one  with  physical  descriptions,  the  other  with  MDS  descriptions.  The  advan¬ 
tage  of  MDS  perceptual  description  observed  with  NEMo’s  fit  to  averaged  data  disap¬ 
peared.  On  the  contrary,  NEMo’s  predictions  with  physical  descriptions  were  slightly 
better  than  NEMo’s  predictions  with  MDS  descriptions,  although  the  difference  of 
RMSD  values  between  two  sets  of  simulations  were  not  significant  (matched-sample 
t-test,  f(7)  =  2.13,  p  =  .07). 

The  reliability  of  MDS  solutions  Previously,  we  considered  one  explanation 
for  the  occasional  failures  in  NEMo’s  fit:  the  perceptual  distortion  associated  with  the 
extreme  items  in  our  set  of  faces.  Here,  we  take  up  another  possible  cause:  within- 
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Table  2:  Best  fitting  parameter  values  for  NeMO’s  fit  to  the  data 


Parameter 

Meaning 

Physical 

MDS 

<?i 

Dimensioni  noise 

0.033 

mkmm 

02 

Dimension  noise 

0.072 

jtjljffl 

03 

Dimensions  noise 

0.045 

llllps 

a  i 

Forgetting  of  1st  item 

0.47 

&  2 

Forgetting  of  2nd  item 

0.40 

0.45 

0 

Interitem  similarity 

-0.53 

-0.34 

T 

Tuning  function  steepness 

9.20 

11.14 

RMSD 

0.123 

0.101 

Table  3:  Means  and  SDs  of  best  fitting  parameters  for  individual  subjects’  simulations 


Parameter 

Meaning 

Physical 

SD 

MDS 

SD 

Dimensioni  noise 

0.035 

0.012 

0.033 

0.020 

02 

Dimension  noise 

0.053 

0.032 

0.061 

0.022 

03 

Dimensions  noise 

0.040 

0.019 

0.044 

0.021 

ai 

Forgetting  of  1st  item 

0.48 

0.34 

0.53 

0.34 

Forgetting  of  2nd  item 

0.37 

0.21 

0.42 

0.21 

(3 

Interitem  similarity 

-0.56 

0.45 

-0.67 

0.46 

T 

Tuning  function  steepness 

9.20 

- 

11.14 

- 

RMSD 

0.152 

0.039 

0.168 

0.032 

subject  variability  in  the  dissimilarity  judgments  on  which  the  MDS  solutions  were 
based.  To  create  a  dissimilarity  matrix  for  each  subject,  subjects  observed  three  faces 
simultaneously  for  only  500  ms,  and  chose  the  face  that  appeared  most  different  from 
the  other  two.  The  brief  exposures  were  influenced  by  the  stimulus  duration  in  our 
recognition  memory  experiments.  We  reasoned  that  if  subjects  were  to  devote  110 
msec  to  viewing  each  face  in  the  trio,  a  total  exposure  of  500  msec  would  permit 
about  two  shifts  in  fixation,  allowing  all  the  faces  to  be  viewed  directly.  However,  the 
time  pressures  introduced  by  relatively  brief  stimulus  presentations  might  also  have 
contributed  to  variability  in  subjects’  responses,  which  in  turn  would  have  introduced 
additional  variability  when  those  responses  were  transformed  to  dissimilarity  matrices. 

To  assess  the  consistency  of  subjects’  triadic  judgments  we  computed  two  different 
mean  MDS  solutions.  One  solution  was  based  on  subjects’  dissimilarity  judgments 
on  all  odd-numbered  trials  (that  is,  first,  third,  fifth,  etc.),  the  second  solution  was 
based  on  judgments  from  all  even-numbered  trials  (that  is,  second,  fourth,  sixth, 
etc.).2  For  each  three  dimensional  solution,  the  Euclidean  distances  between  all  face 

2The  balanced  incomplete  block  design  forced  us  to  take  this  indirect  approach.  Differences  in 
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pairs  were  taken,  and  the  correlation  calculated  between  pairwise  distances  from  odd 
trials  and  pairwise  distances  from  even  trials.  The  results  are  shown  as  a  scatterplot 
in  Figure  9.  Despite  the  brief  stimulus  duration,  and  despite  the  50%  reduction  in 
the  number  of  judgments  on  which  each  MDS  solution  was  based,  the  pair  of  MDS 
solutions  produced  by  this  process  had  a  relatively  strong  correlation  with  r2=0.79, 
which  supports  the  idea  that  triadic  comparisons  generate  good,  reliable  measures  of 
similarity. 
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MDS  distance  with  EVEN  trials 


Figure  9.  Distance  between  items  in  the  MDS  solution  calculated  by  using  even  trials  and 
them  calculated  by  using  odd  trials. 


Individual  differences  among  subjects  In  addition  to  wi thin-subject  vari¬ 
ability,  between-subject  variability,  that  is,  various  individual  differences,  could  di¬ 
minish  the  quality  of  model  fits  when  NEMo  was  applied  to  data  averaged  over 
subjects.  In  the  MDS  solution,  individual  differences  were  represented  by  a  vector 
of  weights  defined  in  an  additional,  individual  difference  space.  Table  4  shows  the 


the  makeup  of  triads  on  successive  trials  prevented  us  from  comparing  judgments  themselves.  As  a 
result,  our  comparisons  had  to  be  mediated  via  MDS. 
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weights  for  each  subject  on  each  of  the  three  dimensions,  along  with  the  r2  values 
for  each  subject  against  the  averaged  MDS  solution.  The  variation  in  weights  and  in 
r2  values  suggests  that  average  MDS  solution  we  obtained  might  not  equally  reflect 
different  subjects’  perceptual  representations.  For  example,  Subject  1  and  Subject 
5  differed  from  each  other,  both  in  their  dimensional  weights,  as  well  as  in  their  r2 
values.  This  variation  in  r2  values  might  have  come  from  differences  in  the  consis¬ 
tency  of  individual  subjects’  similarity  judgments.  To  test  this  hypothesis,  we  derived 
a  within-subject  measure  of  response  consistency  and  rank  ordered  subjects’  consis¬ 
tency  measures  and  their  r2.  A  rank  order  test  showed  that  the  correlation  between 
the  two  rankings  was  statistically  significant  (Kendall’s  r=  0.84,  p<.05),  which  was 
consistent  with  the  idea  that  r2  values  reflected  subjects’  response  consistency. 


Table  4:  Individual  Subjects’  INDSCAL  Weights  and  r2  Values 


Subject 

Dim  1 

Dim  2 

Dim  3 

1 

.49 

.59 

.41 

.89 

2 

.45 

.28 

.38 

.61 

3 

.54 

.49 

.41 

.82 

4 

.63 

.47 

.34 

.83 

5 

.15 

.12 

.13 

.33 

6 

.45 

.40 

.47 

.76 

7 

.58 

.40 

.32 

.70 

8 

.41 

.46 

.51 

.70 

In  a  supplementary  experiment,  we  measured  several  subjects’  eye  fixations  while 
they  performaed  triadic  comparisons.  Of  the  60  three-item  study  series  used  in  main 
part  of  Experiment  3,  we  randomly  chose  12  series  for  use  here.  Each  trio  of  faces 
was  presented  five  times  in  each  of  three  spatial  arrangements,  e.g.,  xyz,  yxz,  or 
zxy.  Repetitions  of  any  trio  were  randomly  interspersed  over  180  total  trials  per 
subject.  Three  subjects  participated  in  this  supplementary  experiment,  one  of  whom 
had  participated  in  the  previous  MDS  experiment.  On  average,  subjects  made  1.93 
fixation  shifts  per  trial  (SD  =  0.09).  On  47.7%  of  the  trials,  subjects  seemed  not  to 
fixate  every  one  of  the  three  faces  directly.  However,  for  88%  out  of  all  trials,  subjects 
selected  as  most  dissimilar  one  of  the  faces  that  they  had  directly  fixated.  As  the 
distance  between  the  center  of  one  face  and  the  center  of  a  neighboring  face  was  only 
about  5  degrees  visual  angle,  subjects  could  easily  use  perifoveal  vision  (see  Geisler  & 
Chou,  1995)  to  identify  the  most  different  face  and/or  to  decide  the  direction  in  which 
to  shift  gaze.  Surprisingly,  the  consistency  in  subjects’  judgments  was  not  mirrored 
by  consistency  in  the  pattern  of  fixations  elicited  by  a  triad  on  successive  appearances. 
In  particular,  the  order  in  which  faces  were  fixated  seemed  to  be  random,  and  not 
systematically  related  to  the  dissimilarity  judgment  that  would  be  made. 
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Discussion 

Multidimensional  Scaling  MDS  requires  as  input  a  matrix  of  similarity  or 
dissimilarity  judgments.  Researchers  have  taken  various  approaches  to  generate  such 
matrices.  For  example,  the  input  matrix  has  been  generated  by  asking  subjects  to 
rate  the  distinctiveness  of  items  presented  one  at  a  time  (Valentine  &  Bruce,  1986; 
Valentine,  1991;  Valentine  &  Endo,  1992;  Lee  et  al.,  2000),  or  to  rate  numerically 
the  similarity  of  items  presented  in  pairs  (Nosofsky,  1991;  Johnstone  &  Williams, 
1997;  Peters  et  al.,  2003).  We  took  a  different  approach,  using  triadic  comparisons  to 
produce  the  input  matrix  for  MDS.  We  chose  this  method  in  part  for  its  efficiency  in 
generating  many  comparisons  per  pair  of  faces,  and  in  part  because  the  task  might 
bear  greater  resemblance  to  our  recognition  memory  task.  For  one  thing,  by  encour¬ 
aging  subjects  to  distinguish  among  members  of  the  briefly-presented  triad,  the  task 
engaged  the  rapid  and  sometime  subtle  distinctions  required  in  the  recognition  judg¬ 
ments.  At  the  same  time,  variation  in  the  triad’s  constituents  from  one  presentation 
to  another,  mimicked  the  trialwise  variation  among  study  series. 

Torgerson  (1958)  described  other  variants  of  the  method  of  triadic  comparisons, 
which  require  subjects  to  make  several  explicit  pairwise  judgments  per  trial.  Note 
that  the  single  explicit  judgment  required  on  each  trial  in  our  application  actually 
implies  that  subjects  have  made  one  or  more  pairwise  comparisons,  although  such 
comparisons  are  not  made  explicit.  Letting  the  stimuli  in  the  triad  be  i,j,  and  k, 
our  subjects’  identification  of  one  item  as  most  dissimilar  could  reflect  evaluations 
of  inequalities  among  | i  —  j\,  \ i  —  k\,  and  | j  —  k\.  Of  course,  when  subjects  are  not 
forced  to  make  such  evaluations  explicit,  one  cannot  rule  out  the  possibility  that, 
particularly  with  time  pressures,  subjects  might  make  only  some  subset  of  all  the 
pairwise  comparisons.  Although  based  on  incomplete  information,  such  judgments 
would  be  better  and  more  consistent  than  random  guesses. 

Simulations  with  NEMo  Episodic  visual  memory  for  synthetic  faces  was  sat¬ 
isfactorily,  though  not  perfectly,  predicted  by  NEMo.  That  is,  the  results  were  consis¬ 
tent  with  the  idea  that  each  study  item  was  stored  as  a  noisy  exemplar,  and  also  with 
the  idea  that  recognition  decisions  depended  upon  both  summed  similarity  and  inter¬ 
item  similarity.  When  NEMo  used  MDS  descriptions  of  face  stimuli,  the  outcome  was 
influenced  by  two  sources  of  noise  that  were  absent  when  NEMo  used  faces’  physical 
descriptions.  One  source  of  noise  came  from  any  inconsistency  in  subjects’  judgments 
of  triads  of  faces.  Although  triadic  comparisons  were  reasonably  consistent,  there  is 
some  obvious  residual  variability  in  those  comparisons,  as  suggested  by  Figure  9.  A 
second  source  of  potential  noise  came  from  individual  differences  in  similarity  judg¬ 
ments.  Table  4  shows  each  subject’s  weights  on  each  of  three  MDS  dimensions.  It 
is  clear  that  subjects  differed  in  the  weights  they  give  to  different  dimensions.  As 
a  result,  the  mean  MDS  solution  used  in  Figure  8  was  not  equally  representative  of 
every  subject’s  own  perceptual  space.  As  these  two  noise  sources  were  irrelevant  to 
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the  specification  of  faces  in  physical  units,  other  things  being  equal,  we  might  expect 
a  larger  error  in  NEMo’s  predictions  when  NEMo  used  face  specifications  from  the 
MDS  solution.  This  might  explain  why,  when  we  simulated  individual  subjects’  per¬ 
formance,  we  found  no  significant  benefit  of  incorporating  perceptual  representations 
into  the  simulations. 

Experiment  4 

The  simulations  in  Experiment  3  showed  that  visual  memory  performance  could 
be  predicted  quite  well  using  a  model,  which  takes  account  of  both  summed  similarity 
and  inter-item  similarity.  Because  the  inter-item  similarity  term  is  a  novel  addition 
to  the  summed  similarity  framework,  we  sought  a  more  direct  demonstration  that 
inter-item  similarity  actually  was  important  in  recognition.  Therefore  we  designed 
stimulus  series  in  which  both  summed  similarity  and  inter-item  similarity  were  var¬ 
ied.  We  expected  that  the  responses  produced  by  various  combinations  of  the  two 
factors  would  directly  demonstrate  the  contribution  of  each  factor,  even  without  the 
mediation  of  a  computational  model. 

Methods 

Apparatus,  Stimuli  and  Procedure  The  apparatus  and  stimuli  were  the 
same  as  in  Experiment  2.  The  procedure  was  the  same  as  Experiment  2  except  that 
each  trial’s  three  study  faces  were  forced  to  come  from  three  different  categories  of 
faces,  A. .  .D.  Study  lists  were  first  generated  randomly.  Because  of  the  upper  limit 
on  possible  pair-wise  distances  in  physical  space,  distances  among  randomly  selected 
faces  tended  to  be  small,  which  produced  skewed  distributions  of  summed  similarity 
and  inter-item  similarity.  Moreover,  the  randomly-generated  lists,  caused  some  co- 
variance  between  summed  similarity  and  inter-item  similarity.  To  test  wider  ranges 
of  both  types  of  similarity  independently,  summed  similarity  and  inter-item  similarity 
had  to  be  to  distributed  uniformly.  To  generate  series  that  met  this  criterion,  the 
distributions  of  two  kinds  of  similarity  were  calculated  after  random  generation  of  a 
set  of  series,  and  existing  stimulus  series  were  replaced  by  newly  generated  ones  until 
we  had  a  set  of  study  series  that  satisfied  the  distribution  requirement. 

Subjects  Twenty  nine  Brandeis  undergraduates  participated  as  part  of  a  course 
requirement;  during  a  single  session,  each  subject  produced  436  trials.  All  subjects 
were  naive  to  the  experimental  purpose,  and  none  had  taken  part  in  our  other  exper¬ 
iments. 

Results  and  Discussion  For  each  series  on  which  p  was  a  lure,  we  used  the 
faces’  physical  coordinates  to  calculate  summed  similarity  between  p  and  all  study 
items,  and  the  inter-item  similarity.  For  this  purpose,  similarity  was  defined  by  the 
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Euclidean  distance  in  physical  space,  that  is,  the  distance  measures  represented  in 
Figure  2 A.  We  sorted  the  trials  into  the  cells  of  a  3x3  matrix,  whose  rows  represented 
three  levels  of  summed  similarity,  and  whose  columns  represented  three  levels  of  inter¬ 
item  similarity.  At  the  end  of  the  sorting,  each  cell  of  the  3x3  matrix  contained  24 
trials  per  subject.  The  proportion  of  ”yes”  responses  was  calculated  separately  for 
each  of  the  nine  cells  in  the  matrix;  these  values  are  plotted  in  Figure  10.  The 
parameter  of  the  family  of  curves  is  inter-item  similarity.  The  proportion  of  ”yes” 
responses  increased  with  summed  similarity,  but  decreased  with  growth  in  inter¬ 
item  similarity.  A  repeated  measures  ANOVA  showed  that  both  these  effects  were 
statistically  significant,  F( 2,56)  =  85.6,  p  <  .01,  and  F(2,56)  =  7.08,  p  <  .05,  for 
summed  similarity  and  inter-item  similarity,  respectively.  The  interaction  was  not 
significant,  F(4, 112)  =  1.16,p  =  .33. 

Note  that  the  directions  of  the  two  effects  observed  here  reproduced  the  corre¬ 
sponding  effects  that  we  saw  in  simulations  with  NEMo.  Of  particular  interest  was 
the  direct  confirmation  that  inter-item  similarity  and  summed  similarity  operate  in 
opposed  directions  to  influence  recognition  judgments,  just  as  NEMo  demonstrated 
they  did. 

General  Discussion 

Physical  coordinates  vs  MDS  solutions  NEMo  gave  a  good  account  of 
visual  recognition  memory  performance  for  face  stimuli.  When  average  data  across 
subjects  were  applied,  the  model’s  account  was  improved  when  face  similarity  was 
described  in  terms  of  the  MDS  solution,  rather  than  in  physical  coordinates.  How¬ 
ever,  the  advantage  of  MDS  solution  was  lost  when  NEMo  was  run  on  data  from 
individual  subjects.  It  is  important  to  note  that,  with  either  perceptual  or  physical 
representations  as  input,  NEMo  had  the  same  number  of  free  parameters,  therefore 
this  null  difference  did  not  result  from  a  difference  in  the  number  of  free  parameters. 

One  might  think  NEMo’s  predictions  would  be  more  accurate  when  model  fits  took 
account  of  perceptual  representations.  Earlier,  we  considered  sources  of  variability  in 
the  triadic  comparisons  that  were  the  basis  for  the  MDS,  perceptual  representations. 
These  sources  of  variability  could  have  somewhat  limited  the  quality  of  model  fits. 
Another  potential  explanation  for  our  failure  to  find  improved  fits  with  a  perceptual 
representation  is  the  lack  of  strong  overall  differences  between  the  faces’  physical 
descriptions  and  their  perceptual  descriptions.  As  shown  in  Table  1  and  in  Figure  7, 
the  differences  between  two  descriptions  were  small,  except  for  those  few  faces  that 
deviated  most  from  the  mean  face.  We  speculate  that  had  our  set  of  faces  been  more 
heavily  weighted  toward  extreme  faces,  the  outcome  might  have  differed. 

We  should  note  that  the  failure  to  gain  an  advantage  from  using  a  perceptual 
description  of  face  stimuli  is  consistent  with  results  from  one  other  recent  study.  Pe¬ 
ters  et  al.  (2003)  found  that  the  performance  of  categorization  models  was  either 
unchanged  or  even  slightly  diminished  when  face  stimuli  were  described  using  MDS 
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Figure  10.  The  proportion  of  ”yes”  responses  as  a  function  of  summed  similarity.  Values 
are  plotted  separately  for  three  different  levels  of  inter-item  similarity.  The  diameter  of 
each  filled  circle  signifies  the  magnitude  of  inter- item  similarity.  Error  bars  represent  ±1 
standard  error  of  the  mean,  corrected  for  within  subject  variability  (Loftus  &  Masson,  1994). 

rather  than  a  native,  physical  metric.  In  that  study,  subjects  were  trained  to  catego¬ 
rize  schematic  Brunswik- Reiter  faces  (1937),  as  well  as  slightly  more  elaborate,  car¬ 
toon  faces.  These  faces  were  defined  in  4-dimensional  physical  spaces,  which  subjects 
learned  to  bifurcate  using  a  linear,  separable  criterion.  After  learning  the  category 
membership  of  various  exemplars,  subjects’  categorization  was  tested  with  mixtures 
of  previously-seen  and  new  faces.  Peters  et  al.  (2003)  ’s  results  suggested  that  sub¬ 
jects  did  not  store  the  learned,  long-term  information  as  individual  exemplars.  We 
believe  that  this  was  quite  different  from  the  case  in  our  experiments  on  episodic 
recognition,  where  subjects  store  individual  exemplars,  at  least  for  the  duration  of  a 
single  trial. 

Differences  from  compound  gratings  One  of  the  purposes  of  this  study  was 
to  investigate  visual  memory  with  higher  dimensional  stimuli,  particularly  by  apply- 
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ing  NEMo  to  memory  for  synthetic  faces.  The  best  fitting  parameters  obtained  here 
preserved  the  general  characteristics  observed  in  Kahana  and  Sekuler  (2002)  ’s  studies 
with  memory  for  compound  gratings.  For  example,  in  both  studies,  a  values  cap¬ 
tured  the  observed  recency  effects,  and  the  significantly  negative  j3  values  suggested 
that  recognition  judgments  for  both  types  of  stimuli  are  modulated  by  inter-item 
similarity.  Moreover,  despite  the  fact  that  r  was  set  as  a  scale  free  parameter  and  a 
was  set  to  one  in  the  study  with  compound  gratings,  and  r  were  based  on  empirical 
estimates  here,  the  obtained  r  values  were  close  between  two  studies  (8.8  and  10.7 
with  compound  gratings,  9.20  and  11.14  with  synthetic  faces).3  It  seems,  then,  that 
similar  similarity-distance  functions  operate  for  both  low-dimensional  (gratings)  and 
high-dimensional  stimuli  (synthetic  faces). 

However,  memory  for  synthetic  faces  may  differ  in  an  important  way  from  the 
memory  for  compound  gratings.  Even  though  we  found  recency  effect  with  synthetic 
faces  as  well  as  with  gratings,  attributes  of  recency  effect  observed  in  this  study 
differed  from  those  found  with  gratings.  With  gratings,  performance  increased  linearly 
across  serial  positions,  from  the  beginning  to  the  end  of  a  study  series  (Kahana  & 
Sekuler,  2002).  Here,  though,  the  serial  positions  preceding  the  last  one  produced 
essentially  equivalent  performance.  This  implies  that  with  higher  dimensional  stimuli, 
instead  of  forgetting  previously  seen  items  gradually,  the  last  seen  item  diminishes 
the  memory  for  all  previously  seen  items  equally.  This  resembles  a  result  reported 
previously  (Phillips,  1974,  1983;  Phillips  &;  Christie,  1977). 

Because  procedures  differed  between  the  studies,  we  must  be  cautious  in  attribut¬ 
ing  various  discrepancies  in  the  results  to  differences  in  stimulus  dimensionality.  But 
we  do  believe  that  parallel  studies  of  recognition  memory  for  gratings,  faces,  and  other 
high-dimensional  stimuli,  as  well  as  comparable  stimuli  from  other  sensory  modali¬ 
ties,  will  ultimately  help  us  understand  how  stimulus  dimensions  contribute  to  human 
episodic  recognition  memory. 


3We  ran  simulations  with  various  r  values,  and  found  that  differences  this  small  did  not  appre¬ 
ciably  alter  the  model’s  resulting  RMSD.  So  differences  in  the  similarity-distance  functions  were  not 
critical  to  the  success  of  model  fits. 
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Work  on  Objective  Two:  Coordinated  Identification  &  Recognition 

Many  theories  of  memory  distinguish  between  recognition  that  some  item  had 
been  encountered  previously  ( item  recognition)  and  recognition  of  that  item’s  previous 
context  or  source  ( source  recognition)  (for  example,  Johnson,  Hashtroudi,  &  Lindsay, 
1993;  Hockley  &  Cristi,  1996).  When  multiple  study  items  are  presented  successively, 
potential  source  information  includes  the  items’  temporal  order  (e.g.  ?,  ?,  ?).  Some 
researchers  have  suggested  that  item  and  source  memory  depend  upon  distinct  neural 
substrates  (Dobbins,  Foley,  Schacter,  &  Wagner,  2002;  Rugg  &  Yonelinas,  2003).  To 
assess  the  links  between  these  two  aspects  of  memory  we  generated  coordinate  assays 
of  item  and  source  memory  using  specially-designed  stimulus  materials  and  a  novel 
paradigm. 

Arguably,  verbal  study  items,  which  are  commonly  used  in  memory  studies,  are 
less  than  ideal  for  theory  testing.  In  particular,  it  is  difficult  to  specify  the  repre¬ 
sentation  of  verbal  materials  in  a  metric  space,  a  problem  that  can  be  significantly 
curtailed  when  using  perceptually-defined  stimuli  (Kahana  &  Sekuler,  2002;  Yot- 
sumoto  &  Sekuler,  in  pres).  For  example,  sinusoidal  luminance  gratings,  which  can 
be  combined  to  synthesize  more  complex  images  such  as  textures  and  natural  scenes, 
can  be  described  in  low-dimensional  spaces  whose  principal  axes  are  spatial  frequency 
and  orientation,  dimensions  that  are  extracted  early  in  visual  processing  (Graham, 
1989;  Olzak  &  Thomas,  1999).  Such  stimuli,  which  can  be  designed  to  resist  ver¬ 
bal  encoding  and  rehearsal  (Hwang-Grodzins,  Jacobs,  Danker,  Sekuler,  &  Kahana, 
2005),  make  it  easy  to  manipulate  similarity  relations  among  stimuli,  and  to  exploit 
those  relations  in  computational  modeling  of  memory  (Kahana  &  Sekuler,  2002).  The 
present  work  exploits  the  metric  properties  of  memory  stimuli  in  order  to  identify  and 
quantify  influences  on  source  misidentification. 

The  present  studies  examine  episodic  memory  for  visual  textures  using  both 
old/new  recognition  judgments  (asking  subjects  to  judge  whether  a  probe  was  part 
of  a  just  presented  list)  and  source  identification  judgments  (asking  subjects  to  judge 
the  serial  position  of  the  item  within  the  list).  Subjects  expressed  their  judgments  on 
an  analog  rating  scale,  which  eased  the  generation  of  receiver  operating  characteris¬ 
tics  (ROCs).  ROC  analysis,  which  is  grounded  in  signal  detection  theory  (Wickens, 
2002),  has  proven  fruitful  in  testing  theories  of  recognition  memory.  Our  own  anal¬ 
ysis  builds  on  the  fact  that  many  models  of  recognition  and  source  memory  make 
explicit  predictions  about  the  shapes  of  ROCs  (for  example,  Slotnick,  Klein,  Dodson, 
&  Shimamura,  2000;  Hilford,  Glanzer,  Kim,  &  DeCarlo,  2002;  Rotello,  Macmillan,  &; 
Reeder,  2004;  Yonelinas  &  Levy,  2002). 

Experiment  1 

On  each  trial,  subjects  saw  three  briefly-presented  study  items.  This  series  of 
study  stimuli,  whose  members  varied  from  trial  to  trial,  was  followed  by  a  probe 
item  (p),  which  either  matched  one  of  the  preceding  study  items  or  differed  from  all 
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three.  Subjects  used  an  analogue  rating  scale  to  identify  the  serial  position  of  the 
study  stimulus  that  matched  p;  if  no  study  item  matched  p,  subjects  registered  that 
judgment  with  a  no  response.  The  analogue  scale  also  allowed  subjects  to  express  their 
confidence  in  their  judgments  (Watson,  Rilling,  &;  Bourbon,  1964).  These  confidence 
judgments  were  used  to  generate  receiver  operating  characteristics  (ROCs),  which 
were  needed  to  test  some  theoretical  predictions. 

In  Experiment  1,  on  75%  of  all  trials  the  probe  item  replicated  one  of  the  three 
study  items.  For  terminological  convenience,  we  use  the  term  Target  ( T)  to  designate 
trials  on  which  p  replicated  a  study  item,  and  the  term  Lure  ( L )  to  designate  trials  on 
which  p  did  not  replicate  any  of  the  study  items.  Additionally,  we  use  ti  to  designate 
the  serial  position  in  which  the  study  item  was  replicated  by  p.  Over  all  T  trials,  ti 
equally  often  was  the  first,  second  or  third  item  in  the  set  of  study  items.  Allowing 
T  trials  to  occur  3x  as  often  as  L  trials  helped  the  examination  of  possible  effects  of 
ti’s  serial  position. 


Stimuli  Stimuli  for  each  trial  were  drawn  from  a  pool  of  compound  sinusoidal  grat¬ 
ings,  each  comprising  superimposed  vertical  and  horizontal  sinusoidal  luminance  grat¬ 
ings.  Each  compound  grating’s  luminance  profile,  LXjy,  was 


Ax,y  —  Lavg 


Aicos2Trf(x)  +  A2Cos2ng(y) 
2 


(4) 


where  Lavg  is  the  mean  luminance;  /  is  the  spatial  frequency  of  the  stimulus  vertical 
component,  (vertical  frequency)  in  cycles  per  degree;  g  is  the  frequency  of  the  hori¬ 
zontal  component,  (horizontal  frequency);  A\  and  A2,  the  Michelson  contrasts  for  the 
two  components,  were  set  to  0.2,  a  value  well  above  detection  threshold.  To  minimize 
edge  effects,  stimuli  were  windowed  by  a  circular  2-D  Gaussian  with  space  constant  of 
1  degree  visual  angle.  Before  application  of  this  window,  a  grating’s  width  subtended 
5  degrees  visual  angle. 

Recognition  performance  for  visual  stimuli  is  influenced  by  visual  as  well  as 
mnemonic  factors,  but  we  took  steps  to  even  out  individual  differences  in  vision.  For 
each  subject,  we  generated  a  unique  pool  of  compound  grating  stimuli  by  crossing  five 
vertical  spatial  frequencies  with  five  horizontal  spatial  frequencies.  Following  Zhou 
et  al.  (2004),  we  measured  each  subject’s  visual  discrimination  threshold,  and  then 
scaled  all  of  that  subject’s  stimuli  accordingly.  This  was  meant  to  reduce  the  influence 
of  individual  differences  in  perception  on  measurements  of  memory  performance.  Fre¬ 
quency  discrimination  thresholds  were  measured  using  a  staircase  method  in  which 
subjects  compared  the  spatial  frequencies  (2  cycles/degree)  of  two  briefly  presented 
gratings  (750  msec  duration  each),  which  were  separated  by  an  inter-stimulus  interval 
(ISI)  of  400  msec.  These  same  temporal  conditions  were  used  in  our  memory  experi¬ 
ments.  Frequency  discrimination  thresholds  ranged  from  4.3  to  13.1%,  with  a  mean 
=  8.2,  SD  =  3.1.  Vertical  as  well  as  horizontal  spatial  frequencies  were  allowed  to 
assume  values  of  2  cycles/degree  ±  3  or  6  times  a  subject’s  Weber  fraction  for  changes 
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in  spatial  frequency.  As  the  mean  Weber  fraction  was  0.082,  the  spatial  frequencies 
of  the  average  stimuli  were  1.51,  1.75,  2.0,  2.25,  and  2.49  cycles/degree.  Figure  11 
illustrates  the  set  of  compound  gratings  that  correspond  to  these  typical  values. 
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Figure  11.  The  average  set  of  stimuli  used  in  experiment.  Within  each  row,  vertical  spatial 
frequency  changes  by  three  or  six  threshold  units,  decreasing  and  increasing  from  the  mean 
of  2  cycles/deg,  shown  in  the  center  of  the  stimulus  matrix.  Within  each  column,  horizontal 
spatial  frequency  changes  in  the  same  way,  again  relative  to  the  mean  of  2.0  cycles  degree. 


Subjects  Subjects  were  ten  paid  volunteers  whose  ages  ranged  from  19  to  28  years 
(mean=22.9,  SD=3.3).  Subjects’  acuity,  measured  with  Landolt  C  targets,  ranged 
from  20/13-20/22,  mean  =  20/16.8,  SD  =  2.7;  contrast  sensitivity,  measured  with 
the  Pelli-Robson  charts  (Pelli  et  ah,  1988)  ranged  from  1.80-1.95,  mean  =  1.92,  SD 
=  0.06. 

Procedure  Each  trial’s  set  of  study  items  comprised  three  compound  gratings, 
followed  by  a  probe  stimulus  (p).  Each  of  the  three  study  stimuli  (sl5  S2,  and  S3)  was 
presented  for  750  msec,  separated  by  ISI’s  intervals  of  400  msec  each.  Then,  after  a 
delay  of  1000  msec,  a  warning  tone  sounded,  and  p  was  presented  for  750  msec.  One 
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second  later,  a  response  scale  was  presented,  and  remained  visible  until  the  subject’s 
response  had  been  registered.  The  response  scale,  shown  at  the  left  side  of  Figure 
12,  was  used  by  subjects  to  report  whether  p  had  been  in  the  study  set,  and,  if  so, 
which  study  item,  first,  second  or  third,  was  matched  by  p.  This  scale  consisted 
of  four  selection  arms,  which  were  labeled  “None,”  “First,”  “Second,”  or  “Third.” 
If  p  seemed  to  match  one  of  the  study  items,  subjects  used  the  computer  mouse  to 
identify  the  arm  that  corresponded  to  the  serial  position  (1st,  2nd  or  3rd)  of  the  study 
stimulus,  si,  s2,  or  s3,  which  matched  p.  If  p  seemed  to  match  none  of  the  study  items, 
the  subject  positioned  the  cursor  on  the  arm  labelled  “None.”  In  addition,  subjects 
were  encouraged  to  position  the  cursor  in  a  way  that  expressed  their  confidence  that 
they  had  selected  the  response  correct  arm.  In  particular,  cursor  positions  near  the 
intersection  of  the  four  arms  signaled  little  confidence  in  the  judgment;  positions 
further  away,  toward  to  an  arm’s  outer  end  signaled  high  confidence. 

Once  the  subject  was  satisfied  with  the  cursor’s  location,  a  click  of  the  computer 
mouse  button  caused  the  computer  to  register  the  cursor’s  location  along  the  re¬ 
sponse  arm.  No  instructions  were  given  about  the  speed  with  which  subjects  should 
respond.  On  average,  once  the  scale  was  presented,  a  response  was  registered  in 
about  2-3  seconds,  which  was  sufficiently  short  that  memory  would  not  have  decayed 
significantly  (Kahana  k  Sekuler,  2002;  Sekuler,  Kahana,  McLaughlin,  Golomb,  k 
Wingfield,  2005). 

After  each  response,  one  of  two  tones  sounded,  providing  feedback  about  response 
correctness.  On  T  trials,  feedback  was  contingent  upon  the  response’s  identification 
component:  feedback  signalled  whether  the  subject’s  response  correctly  identified 
which  study  item,  Sj,  S2,  or  S3,  matched  p;  like  incorrect  identification  responses,  a 
“None”  response  on  a  T  trial  brought  feedback  that  the  response  was  wrong.  On  L 
trials,  feedback  was  contingent  on  whether  the  response  correctly  reflected  that  none 
of  the  study  items  matched  p;  all  other  responses,  “First,”  “Second,”  or  “Third,” 
were  followed  by  feedback  that  the  response  had  been  wrong. 

Trialwise  variation  in  stimulus  spatial  frequency  forced  subjects  to  base  judgments 
on  the  most  recently-seen  study  items;  hence,  the  requisite  memory  can  be  described 
as  episodic.  Each  subject  was  tested  on  800  trials,  distributed  across  four  one-hour 
sessions. 

The  display’s  mean  luminance  was  maintained  at  17.8  cd/m2,  which  prevented 
distracting  luminance  transients  that  would  otherwise  have  accompanied  change  of 
stimulus.  A  subject  viewed  the  stimulus  display  from  a  distance  of  114  cm,  head 
supported  and  steadied  by  a  combination  head  rest  and  chin  cup.  Trials  were  self- 
paced.  On  each  trial,  Si,  S2,  and  s3  were  sampled  randomly  without  replacement 
from  the  pool  of  25  stimuli  had  been  were  generated  for  that  subject.  On  75%  of  the 
trials,  p  replicated  Si,  S2,  or  s3,  with  equal  frequency.  On  the  remaining  trials,  p  was 
chosen  randomly  from  the  22  members  of  the  stimulus  pool  that  were  not  among  that 
trial’s  three  study  items.  These  probabilities  were  explained  to  subjects  prior  to  the 
experiment. 
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Figure  12.  Left:  Four-armed,  analogue  selection  scale  used  by  subects  to  express  judg¬ 
ments  in  Experiments  1  and  2.  Right:  Two-armed,  analogue  selection  scale  used  to  express 
recognition  judgments  in  Experiment  3. 


Experiment  2 

Although  Experiment  1  examined  source  identification,  its  overall  design  was 
grounded  in  prior  studies  of  recognition  memory,  in  which  no  identification  responses 
were  taken  (Kahana  &  Sekuler,  2002).  Because  those  studies  used  balanced  sched¬ 
ules  of  T  and  L  trials,  we  were  concerned  that  the  unbalanced  schedule  used  here 
in  Experiment  1  might  undermine  comparisons  between  Experiment  1  and  previous 
studies.  Therefore,  Experiment  2  replicated  the  conditions  of  Experiment  1,  but  with 
a  balanced  schedule  of  stimuli. 


Subjects  Five  paid  volunteers  (aged  18  -  21  years,  mean  =  19.8,  SD  =  1.1)  partic¬ 
ipated  in  this  study.  None  had  served  in  the  preceding  experiment.  Subjects’  acuity 
ranged  from  20/13-20/20,  mean  =  20/16.6,  SD  =  2.7;  contrast  sensitivity  ranged 
from  1.80-1.95,  mean  =  1.89,  SD  =  0.08;  frequency  discrimination  thresholds  ranged 
from  6.0-10.2%,  mean  =  9.2,  SD  =  3.1.  All  measurements  used  the  same  techniques 
as  in  the  previous  experiment. 


Stimuli  and  Procedure  This  experiment  used  the  same  stimulus  set  as  the  previ¬ 
ous  experiment,  with  stimulus  spatial  frequencies  again  tailored  to  individual  subjects’ 
frequency  discrimination  thresholds.  The  proportion  of  T  trials  was  decreased  from 
75%  in  Experiment  1  to  50%  in  Experiment  2.  As  before,  with  equal  frequency  p  was 
made  to  match  Si,  S2,  or  S3.  These  probabilities  were  explained  to  the  subjects  prior 
to  the  experiment.  There  were  no  other  differences  between  this  experiment  and  its 
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predecessor.  Each  subject  was  tested  on  800  trials,  distributed  across  four  one-hour 
sessions. 

Experiment  3 

In  the  preceding  two  experiments  subjects  made  a  source  memory  judgment  on 
each  trial,  identifying  the  serial  position  of  the  study  item  that  matched  p,  or  respond¬ 
ing  “none” .  It  was  computationally  simple  to  transform  those  source  judgments  into 
equivalent  recognition  memory,  but  we  cannot  ignore  the  possibility  that  the  result 
might  not  truly  correspond  to  recognition  measured  directly.  How  well  does  recogni¬ 
tion  measured  indirectly,  by  means  of  transformed  source  judgments,  correspond  to 
recognition  measured  when  no  source  judgment  was  required?  To  answer  this  ques¬ 
tion,  for  this  experiment  we  modified  the  task  used  in  the  preceding  two  experiments, 
eliminating  source  judgments  and  requiring  only  recognition  judgments. 

Subjects  Five  paid  volunteers  (aged  18-20  years,  mean  =  18.6,  SD  =  0.9)  par¬ 
ticipated.  None  had  served  in  the  preceding  experiments.  Subjects  acuity  ranged 
from  20/15-20/25,  mean  =  20/18,  SD  =  4.5;  mean  contrast  sensitivity  was  1.95; 
frequency  discrimination  thresholds  ranged  from  6.6-18.7%,  mean  =  10.6,  SD  =  4.7. 
Measurements  were  made  using  the  same  techniques  as  in  the  previous  experiments. 

Stimuli  and  Procedure  The  only  difference  between  Experiment  2  and  Experi¬ 
ment  3  was  the  judgment  required  of  subjects,  and  the  response  selection  screen  on 
which  judgments  were  registered.  As  shown  in  Figure  12  (see  Right  Panel),  for  Ex¬ 
periment  3,  the  oblique  arms  of  the  four-arm  response  display  were  eliminated,  and 
subjects  indicated  only  whether  p  had  or  had  not  been  among  the  study  items  for 
that  trial;  no  identification  response  was  required.  To  remind  subjects  of  the  task,  one 
arm  of  the  response  screen  was  labeled  ”yes,”  and  the  other  arm  was  labeled  ”no.” 
As  before,  participants  signaled  their  confidence  in  each  judgment  by  clicking  on  the 
appropriate  arm,  with  distance  from  the  center  of  the  screen  indicating  increasing 
confidence.  Subjects  were  informed  that  T  and  L  trials  would  occur  with  equal  fre¬ 
quency.  In  other  respects,  Experiment  3  was  identical  to  Experiment  2.  Each  subject 
was  tested  on  800  trials,  distributed  across  four  one-hour  sessions,  and  stimuli  were 
tailored  to  individual  subjects’  frequency  discrimination  thresholds. 

Overall  performance  measures 

Experiment  1  For  a  basic  level  analysis,  data  were  cast  into  a  4x4  stimulus- 
response  confusion  matrix.  In  this  matrix,  stimulus  serial  positions  are  designated 
0  (for  L  trials),  1,  2  and  3,  for  each  serial  position;  responses  are  described  as  “no” 
(p  matched  none  of  the  study  items),  “yes-1”  (p  matched  the  study  item  in  serial 
postion  one),  “yes-2,”  and  “yes-3.”  Note  that  the  proportions  in  the  4x4  confusion 
matrix,  and  the  marginals,  correspond  to  standard  accuracy  measures  for  recognition 
and  source  memory. 
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To  assess  various  aspects  of  memory,  values  in  the  confusion  matrix  were  combined 
in  various  ways.  The  first  measure,  the  proportion  of  correct  source  identifications, 
is  given  by 

P(Id)  =  P(ri\ti),  where  i  €  {1,2,3} 

where  t\,  and  t3  represent  the  serial  positions  whose  study  items  could  match  p; 
rj,  r2  and  r3  are  responses  identifying  p  as  matching  the  first,  second  or  third  study 
item,  respectively. 

The  second  measure,  the  proportion  of  correct  recognitions,  is  given  by 
P(R)  —  P(ri\tj),  where  i,j  €  {1,2,3}. 

The  third  measure,  the  proportion  of  a  correct  source  identification  conditional 
upon  a  correct  recognition,  is  given  by 

P(Id\R)  =  {P(ri\ti)\P(ri\tj)}, where  i,j  €  {1,2,3}. 

Table  5  gives  the  the  means  and  within-subject  standard  errors  for  these  three 
measures.  Also,  the  table’s  last  row  shows  the  proportion  of  all  trials,  T  or  L,  on 
which  subjects  gave  a  source  identification  response,  iq,  r2,  or  r3  (on  the  remaining 
trials,  subjects  responded  no).  In  standard  recognition  experiments,  this  value  cor¬ 
responds  to  the  proportion  of  trials  on  which  subjects  would  have  responded  ”yes,  p 
did  replicate  one  of  the  study  items.” 


Table  5:  Summary  Statistics  for  Experiments  1,  2,  and  3  (Means  and  ±SeM) 


Quantity 

Expt.  1 

Expt.  2 

Expt.  3 

P  (Target) 

0.75 

0.50 

0.50 

P(Id) 

0.68±.04 

0.59±.04 

- 

P(R) 

0.88T.02 

0.78±.02 

0.70±.03 

P(Id|R) 

0.77±.02 

0.76±.03 

- 

P(ri  n  r2  n  r3) 

0.78T.02 

0.57±.02 

- 

Note  first  that  the  P(R)  is  appreciably  larger  than  P(Id),  which  seems  to  signify 
some  loss  of  information  in  going  from  recognition  (the  sense  that  p  had  appeared  in 
the  study  set)  to  source  identification,  which  requires  some  memory  of  an  study  item’s 
serial  position.  A  rough  estimate  of  this  loss  of  information  is  found  in  the  value  of 
the  table’s  fourth  measure,  P(Id|R),  which  falls  somewhere  between  P(R)  and  P(Id). 
If  there  were  no  information  loss,  every  correct  recognition  would  be  accompanied  by 
a  correct  identification,  producing  P(Id|R)=1.0,  rather  than  the  value  obtained,  0.77. 
Finally,  the  last  row  of  Table  5  shows  that  on  slightly  more  than  three-quarters  of  all 
trials  subjects  responded  that  p  matched  one  of  the  study  items.  Note  that  this  value 
is  a  good  match  to  the  actual  probability  (0.75)  which  a  p  replicated  a  study  item. 
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We  take  this  as  a  sign  that  subjects  used  a  near-optimal  strategy,  matching  response 
probability  to  the  actual  probabilities  of  T  and  L  trials. 

For  a  more  detailed  examination  of  relationships  among  the  trio  of  response  mea¬ 
sures  in  Table  5,  the  overall  proportions  were  broken  down  according  to  the  serial 
position  of  U,  the  serial  position  whose  study  item  was  replicated  by  p.  Figure  13A 
shows  the  results.  The  proportion  correct  for  L  trials,  0.51±0.03  is  shown  by  the 
single  data  point  at  the  right  side  of  the  panel.  The  three  serial  position  curves,  one 
for  each  response  measure,  show  a  strong  recency  effect,  with  performance  on  T  trials 
improving  from  ti  through  f3.  A  repeated  measures  ANOVA  confirmed  this  result, 
showing  a  significant  main  effect  of  serial  position  over  all  three  curves  in  Figure  13A 
(F(2, 18)  =  26.17,  p  <  0.01).  Moreover,  the  three  curves  departed  significantly  from 
parallelism,  as  confirmed  by  a  significant  interaction  between  serial  position  and  type 
of  performance  measure  (F(4,36)  =  5.25,  p  <  0.02).  Because  values  for  P(R)  are 
close  to  the  upper  limit  of  1.0,  an  unambiguous  of  the  interaction  is  not  possible, 
interaction  suggests  that  the  information  loss  between  recognition  and  identification, 
which  was  mentioned  earlier,  varies  with  serial  position.  Note  that  the  P(Id)  and 
PR)  were  calculated  only  from  T  trials,  that  is,  only  when  p  matched  one  of  three 
study  items.  False  alarm  rates,  the  proportion  of  identifying  Las  T,  were  0.12,  0.13, 
and  0.11  for  si,  S2,  and  s3,  respectively.  These  false  alarm  rates  did  not  differ  reliably 
across  serial  positions  (F( 2, 18)  =  1.60, p  >  0.20). 


Figure  13.  Proportion  correct  recognitions,  identifications,  and  identifications  given  correct 
recogntions  as  a  function  of  the  serial  position  of  the  study  item  replicated  by  p.  Shown 
also  is  the  proportion  correct  rejection  of  lure  stimuli  (right  side).  Panel  A:  Results  from 
Experiment  1;  Panel  B:  Results  from  Experiment  2.  Vertical  bars  around  each  data  point 
represent  ±1  within-subject  standard  error  (e.g.,  Loftus  &  Masson,  1994). 
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Experiment  2  Table  5  shows  the  basic  results  for  Experiment  2.  Over  all  target 
serial  positions,  the  proportion  of  correct  recognitions  was  0.78.  The  proportion  of 
correct  source  identifications  was  just  0.59,  while  the  conditional  probabilty  of  a  cor¬ 
rect  source  identification  given  a  correct  recognition  was  0.76,  indicating  essentially 
the  same  partial  loss  of  source  memory  as  in  Experiment  1.  Figure  13B  shows  the 
proportion  of  correct  responses  for  L  trials  was  0.55  ±.03.  Additionally,  as  in  Exper¬ 
iment  1,  the  serial  position  of  the  study  item  matched  by  p  had  a  substantial  effect 
(F(2,8)  =  20.70, p  <  0.01).  False  alarm  rates  were  0.09,  0.15,  and  0.09  for  t\,  and 
<3,  respectively. 

Experiment  3  Recognition  memory  performance  for  Expeirment  3  was  expressed 
as  the  proportion  of  correct  recognition  responses:  0.70±.026.  As  in  the  previous 
experiments,  a  repeated  measures  ANOVA  demonstrated  a  main  effect  for  the  serial 
position  of  the  study  item  that  matched  p,  F( 2, 8)  =  11.62,  p  <  0.01.  We  can  compare 
the  correct  recognition  measures  from  the  second  experiment,  in  which  recognition 
was  calculated  by  summing  identification  responses,  to  the  forced-choice  recognition 
measure  in  this  experiment.  A  repeated  measures  ANOVA  showed  that  the  serial 
position  results  for  Experiments  2  and  3  did  not  differ  from  one  another,  F( 2, 16)  = 
1.74,  p  >  0.20. 

Receiver  Operating  Characteristic  (ROC)  analysis 

To  compare  recognition  measured  by  identification  judgments  (Experiments  1  and 
2)  and  recognition  measured  directly  (Experiment  3),  we  calculated  the  proportion 
of  correct  recognition  responses  produced  by  the  two  tasks.  Recognition  measures 
differed  significantly  between  Experiments  1  and  2  (p  —  .02),  most  likely  because  of 
the  experiments’  different  ratios  of  T  to  L  trials.  Given  that  T  trials  comprised  75% 
of  all  trials  in  Experiment  1,  but  just  50%  of  trials  in  Experiments  2  and  3,  signal 
detection  theory  would  predict  these  two  values  of  P(R)  to  be  ordered  as  they  are. 
This  hypothesis  is  bolstered  by  the  observation  that  the  overall  proportion  of  both 
correct  recognitions  and  false  alarms  were  higher  in  Experiment  1  than  in  Experiment 
2  (see  Figure  13).  Also  consistent  with  this  hypothesis  is  the  similarity  in  P(R)  for 
Experiments  2  and  3,  0.88±.02  and  0.78±.02  respectively,  (p  =  .18.02). 

The  difference  in  stimulus  schedule  could  have  affected  performance  either  by 
changing  accuracy  of  memory,  such  as  might  come  from  differences  in  task  difficulty 
and  attentional  demands,  from  a  change  in  subjects’  criterion,  or  from  some  combina¬ 
tion  of  the  two.  To  choose  among  these  alternatives  we  generated  receiver  operating 
characteristic  (ROC)  curves  from  the  judgments  in  each  of  the  three  experiments.  In 
doing  this,  we  exploited  the  confidence  judgments  provided  by  subjects’  use  of  the 
continuous,  analogue  rating  scale  (Nachmias  &  Steinman,  1963). 

Because  our  rating  scale  was  analogue  rather  than  comprising  a  fixed  number  of 
small  categories,  we  sought  an  empirical  estimate  of  how  many  useful  categories  - 
variations  in  confidence-  were  actually  represented  in  subjects’  use  of  the  analogue 
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scale.  After  estimating  that  number,  we  used  it  to  set  the  number  of  categories 
used  to  generate  ROC  curves  from  subjects’  expressions  of  confidence.  To  determine 
the  analogue  rating  scale’s  useable  grain,  we  partitioned  the  rating  scales  into  vary¬ 
ing  numbers  of  bins,  and  for  each  number  we  calculated  the  amount  of  information 
transmitted  by  responses  (Garner  &  Hake,  1951;  Watson  et  al.,  1964). 

Using  the  method  of  Watson  et  al.  (1964),  information  transmitted  was  calculated 
for  correct  identication  responses  that  had  been  sorted  post  hoc  into  varying  numbers 
of  bins.  The  number  of  post  hoc  response  bins  was  varied  from  2  to  12,  with  the 
constraint  that  for  any  subject,  all  bins  contained  equal  numbers  of  responses.  The 
information  transmitted  by  these  correct  resonses  grew  with  the  number  of  response 
categories,  but  reached  asymptote  with  no  more  than  ten  response  categories. 

Therefore,  in  generating  ROC  curves,  we  partitioned  the  analogue  confidence  re¬ 
sponses  into  ten  categories,  with  equal  numbers  of  responses  in  each  (see,  Nachmias 
&  Steinman,  1963).  To  test  the  similarity  of  results  in  Experiment  1  and  Experiment 
2,  individual  ROC  curves  were  generated  for  each  subject,  and  the  area  under  each 
curve  calculated  using  the  trapezoidal  rule  for  numerical  integration  (Wickens,  2002). 
In  this  process,  separate  curves  were  generated  for  Si,  S2,  and  S3,  and  are  shown 
in  Figure  14.  Figures  14A  14B  show  results  from  Experiment  1  and  Experiment  2 
respectively.  Si,  S2,  and  s3  were  plotted  by  filled  circles,  open  squares,  and  filled 
triangles,  respectively.  The  area  beneath  the  curves  was  compared  for  each  target 
serial  position  and  for  the  two  experiments.  Although  the  main  effect  for  target  serial 
position  was  significant  (F(2,26)  =  47.32, p  <  0.01),  neither  the  difference  between 
experiments  (F(l,  13)  =  0.63,  p  >  .40)  nor  the  interaction  of  experiment  and  serial 
position  were  significant  (F(2,26)  =  1.87, p  >  .15). 


Figure  14 ■  ROC  curves  based  on  source  identification  judgments  for  p  in  each  of  three 
serial  positions.  Panel  A:  ROCs  generated  from  Experiment  l’s  results;  Panel  B:  ROCs 
generated  from  Experiment  2’s  results. 
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Figure  15  shows  the  mean  recognition  ROC  curves  for  all  three  experiments.  In 
generating  recognition  ROC  curves  for  Experiments  1  and  2,  analogue  confidence 
ratings  for  iq,  r2,  and  r3  were  aggregated.  For  the  recognition  ROC  curve,  derived 
from  Experiment  3,  we  used  the  analogue  confidence  ratings  associated  with  yes  and 
no  responses.  These  curves  were  generated  for  each  subject,  and  the  area  under  each 
subject’s  ROC  curve  was  computed.  The  mean  area  under  the  ROC  and  the  standard 
errors  associated  with  that  mean  was  0.73±0.019,  0.75±0.027,  and  0.68±0.037,  for 
Experiments  1,  2  and  3,  respectively. 

A  one-way  AN OVA  confirmed  that  areas  under  the  three  ROCs  did  not  differ 
significantly  from  one  another,  (F( 2, 19)  =  1.56, p  >  .20).  This  outcome  carries  two 
important  implications.  First,  the  similarity  of  areas  for  ROCs  from  Experiment  1  and 
Experiment  2  (p  >  .40)  suggests  that  differences  in  recognition  performance  between 
the  two  experiments  (see  Figure  13A,  B)  most  likely  arose  from  a  change  in  criterion, 
rather  than  from  a  change  in  memory  strength  per  se  (Donaldson  &  Murdock,  1968). 
Second,  the  similarity  of  areas  under  all  three  curves  supports  the  assumption  that 
recognition  measured  directly,  as  in  Experiment  3,  is  well  approximated  by  recognition 
estimated  by  aggregating  over  the  three  separate  identification  responses,  iq,  r2,  and 

r3- 

Next,  z-transformed  ROC  curves  were  generated  for  each  subject  by  cumulating 
hit  and  false  alarm  rates  and  converting  those  cumulated  values  into  standard  scores 
(Figure  16).  Figures  16A  and  B  show  the  mean  zROC  curves  for  identification  in 
Experiments  1  and  2,  respectively;  Figure  16C  shows  mean  zROC  curves  for  recogni¬ 
tion  in  all  three  experiments.  In  a  signal  detection  framework,  the  linearity  of  zROC 
curves  suggests  that  the  underlying  noise  and  signal  distributions  are  normal;  if  signal 
and  noise  were  normally  distributed,  zROC  curves  would  be  linear  in  z-coordinates.  A 
number  of  memory  studies  have  reported  linear  zROC  curves  (Murdock,  1982;  Don¬ 
aldson  &  Murdock,  1968;  Ratcliff,  Sheu,  &  Gronlund,  1992).  The  slope  of  linear  zROC 
curves  reflects  the  relative  variances  of  noise  and  signal  distributions;  if  signal  and 
noise  distributions  were  normal  and  had  equal  variance,  the  resulting  zROC’s  slope 
would  be  one.  If  the  signal  distribution  had  greater  variance  than  the  distribution  on 
noise  trials,  zROC  curves  would  have  a  slope  less  than  one;  the  opposite  relationship 
between  variances  would  produce  a  zROC  slope  greater  than  one  (Wickens,  2002) 

The  zROC  curve  generated  for  each  experiment  is  well  described  by  a  linear  func¬ 
tion.  Values  of  ?'2’s  for  the  linear  terms  in  a  second  order  polynomial  fit  were  >  0.95; 
addition  of  a  quadratic  term  improved  the  fit  by  less  than  0.02,  which  was  not  statis¬ 
tically  reliable.  The  slopes  for  the  zROC  curves  are  shown  in  Figure  17.  The  slopes 
for  zROC  curves  generated  from  source  memory  performance  (Figure  17A)  tended 
to  be  <1.0.  Although  several  experiments  have  reported  slopes  of  zROC  curves  for 
source  memory  ~1.0,  those  experiments  differ  in  a  number  of  critical  respects  from 
our  own  (Hilford  et  al.,  2002;  Glanzer,  Hilford,  &  Kim,  2004).  In  contrast  to  these 
zROCs  with  slopes  less  than  one,  zROCs  generated  from  recognition  performance, 
either  directly  (as  in  Experiment  3)  or  indirectly  (for  Experiments  1  and  2)  each  had 
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Figure  15.  Receiver  Operating  Characteristics  (ROCs)  for  recognition  performance  in 
Experiments  1-3.  ROCs  for  Experiments  1,  2,  and  3  are  represented  by  open  circles,  closed 
circles,  and  filled  squares,  respectively. 


a  slope  significantly  greater  than  one  (see  Figure  17B);  this  was  true  even  for  Exper¬ 
iment  3,  whose  zROC  had  the  lowest  slope,  t(4)  =  3.45, p  <  .03).  The  slopes  from 
Experiments  2  and  3  were  not  significantly  different  from  one  another  (p  =  .40) . 

The  few  previously  reported  recognition  zROCs  whose  slopes  were  greater  than 
one  were  attributed  to  familiarity  with  the  memory  items  (Ratcliff  et  al.,  1992),  or 
to  differing  levels  of  attention  during  encoding  (DeCarlo,  2002,  2003).  Although  we 
cannot  definitively  rule  out  these  possibilities,  we  believe  that  in  our  study  a  different 
influence  produced  zROC  slopes  greater  than  one.  We  take  this  up  below,  where  we 
show  that  a  summed  similarity  recognition  model  predicts  this  result. 
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Figure  16.  zROC  curves  generated  from  source  memory  and  recognition  performance. 
Panel  A:  zROC  curves  generated  from  source  memory  performance  in  Experiment  1.  Panel 
B:  zROC  curves  generated  from  source  memory  performance  in  Experiment  2.  Panel  C: 
zROC  curves  generated  from  recognition  performance  in  Experiments  1,  2,  and  3. 


1.4 


1.2 


u 

o 

cm 


0.8 


o 

CD 


c/>  0.6 


target’s  serial  position 


>* 

CD 

.  A 

•  Experiment  1 

O  Experiment  2 

!  14 

• 

- 

o> 

o 

8  1.2 
cm 

o 

*♦— 

<_>  i 

O  1 
cm 

N 

0) 

t  0.8 

o 

<D 

CL 

O 

^  0.6 

.  1  t 

f  i 

* 

$ 

Si  Sz 

S3 

Exptl  Expt2  Expt3 

Experiment 


Figure  17.  Panel  A:  zROC  slopes  for  source  memory  in  Experiments  1  and  2.  Panel  B: 
zROC  slopes  for  recognition  memory. 
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Stochastic  vs.  deterministic  origins  of  source  erors  In  both  Experiments  1 
and  2,  subjects  made  many  source  misidentifications.  On  about  25%  of  all  T  trials 
subjects  correctly  rejected  the  no  response,  only  to  misidentify  the  serial  position  of 
the  study  item  that  had  actually  been  replicated  by  p. 

Errors  in  source  judgments  can  be  useful  in  defining  the  information  that  subjects 
use  to  make  successful  judgments.  This  is  especially  true  when,  as  is  the  case  here, 
the  metric  properties  of  stimuli  afford  the  possibility  of  relating  misidentifications 
to  the  characteristics  of  those  stimuli.  Misidentifications  could  have  arisen  in  two 
distinct  ways.  Some  or  all  of  the  misidentifications  could  have  been  entirely  stochastic, 
reflecting  random  guesses  made  when  subjects  had  no  actual  useable  memory  of 
what  had  been  seen.  Alternatively,  misidentifications  could  have  come  from  some 
deterministic  process,  for  example,  systematic  errors  associated  with  partial  loss  of 
serial  position  information  (e.g.,  Lee  &  Estes,  1977). 

Although  pure,  information-free  guessing  implies  the  existence  of  a  threshold  and 
is  not  consistent  with  non-threshold  variants  of  signal  detection  theory,  this  hypothesis 
is  useful  as  a  starting  point  for  analysis  of  misidentifications.  In  Experiment  1,  0.75 
of  all  trials  were  T  trials.  As  a  result,  information-free  guesses  on  such  trials  would 
have  been  scored  as  correct  recognitions  0.56  of  the  time,  that  is  0.752.  Furthermore, 
because  p  was  equally  likely  to  match  Sx,  s2  or  s3,  a  random  guess  of  serial  position 
would  have  been  scored  as  a  correct  identification  on  0.1875  of  guessed  trials,  that  is, 
0.5625/3.  A  source  error,  then,  would  have  occurred,  on  twice  this  number  of  trials, 
0.375  of  trials  on  which  the  participant  made  an  utterly  random,  memory-free  guess 
that  p  matched  one  of  the  study  items.  Note  that  by  definition,  memory-free  guesses 
cannot  be  systematically  linked  to  any  property  of  the  study  stimuli. 

A  misidentification  could  have  come  also,  not  from  random  guessing,  but  from 
one  or  more  stimulus-related  processes.  With  stimuli  resembling  those  used  in  the 
present  experiments,  Kahana  and  Sekuler  (2002)  showed  that  visual  encoding  or 
study  items  was  noisy,  with  each  study  being  stored  as  a  noisy  exemplar.  Suppose 
that  a  subject’s  judgments  involved  a  process  in  which  noisy  (variable)  exemplars 
of  each  study  item  were  compared  to  p.  This  is  the  key  computation  in  a  large 
class  of  summed  similarity  models  (Hintzman,  1988),  including  the  one  developed  by 
Kahana  and  Sekuler  (2002).  By  comparing  the  summed  similarity,  a  scalar  value, 
against,  an  appropriate  criterion,  the  model  generates  “old-new”  responses.  However, 
this  scalar  value,  alone,  cannot  generate  identification  responses.  To  account  for  such 
responses,  we  could  augment  the  summed  similarity  framework  by  assuming  that 
a  judgment  of  serial  position  is  determined  by  positional  information  of  the  study 
item  whose  memorial  representation  most  closely  matched  p.  Because  of  noise,  on 
some  trials,  then,  the  memory  for  a  study  item  that  physically  did  not  not  match  p 
might  still  more  closely  match  p  than  does  the  memory  for  the  study  item  that  was 
a  physical  match  for  p.  Additionally,  a  participant’s  memory  for  the  matching  study 
item  might  be  sufficient  to  have  produced  a  correct  identification,  but  the  participant 
misremembered  the  serial  position  in  which  that  item  had  occurred. 
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To  evaluate  these  two  possible  causes  of  misidentifications,  we  determined  whether 
misidentifications  might  be  explained  by  some  spatial  or  temporal  attribute  of  the 
stimuli.  The  spatial  attribute  was  the  pairwise  spatial  similarity  between  exemplars 
in  the  two-dimensional  Euclidean  space  within  which  our  stimuli  were  defined.  The 
likely  potency  of  this  variable  was  suggested  by  Kahana  and  Sekuler  (2002)  ’s  account 
of  false  alarms  in  recognition  judgments.  For  the  temporal  attribute,  imagine  that 
there  were  some  orderly,  non-random  forgetting  of  serial  position  information.  One 
form  of  non-random  forgetting  produces  a  ”  locality  constraint,”  which  promotes  local 
rather  than  global  errors.  In  the  task  at  hand,  items  that  occupied  serially  adjacent 
positions  in  a  study  sequence  would  be  more  likely  to  be  confused  with  one  another 
than  would  be  positions  that  more  widely  separated  in  a  sequence  (Lee  &  Estes, 
1977;  Page  &  Norris,  1998).  In  the  case  at  hand,  with  partial  loss  of  serial  position 
information,  Si  would  be  more  likely  to  be  misremembered  as  S2  than  as  S3,  and  S3 
would  be  more  likely  to  be  misremembered  as  S2  than  as  Si.  This  account  is  mute 
on  errors  arising  from  forgetting  of  S2’s  serial  position.  Note  that  the  duration  of 
each  study  item  (750  msec),  together  with  the  400  msec  separating  successive  items, 
should  have  been  sufficient  to  minimize  perceptual  confusions  between  intervals,  which 
suggests  that  misidentifications  arise  from  failure  of  memory  rather  than  failures  of 
perception.. 

To  compare  competing  stochastic  and  deterministic  accounts  of  misidentifications, 
each  participant’s  misidentifications  were  sorted  into  the  cells  of  a  2  x  2  table.  In 
constructing  the  table,  we  considered  only  trials  on  which  p  actually  matched  the 
first  or  last  study  item,  Si  or  S3;  the  remaining  trials,  on  which  p  matched  S2,  do  not 
lead  to  unequivocal  predictions  for  misidentifications.  The  table’s  rows  corresponded 
to  two  levels  of  a  variable  we  call  spatial  similarity,  the  table’s  columns  correspond 
to  two  levels  of  a  variable  we  call  temporal  similarity.  To  generate  the  value  of  spatial 
similarity,  we  used  a  metric  stimulus  space  in  Figure  1  to  calculate  the  Euclidean  dis¬ 
tance  in  spatial  frequency  between  (1)  p  and  the  misidentified  study  item,  and  (2)  p 
and  the  remaining  study  item  that  did  not  match  p.  If  the  first  of  these  two  distances 
were  the  smaller,  we  categorized  spatial  similarity  between  p  and  misidentified  item 
as  ’’high;”  otherwise,  we  categorized  spatial  similarity  as  ’’low.”  For  temporal  similar¬ 
ity,  we  categorized  misidentifications  according  to  whether  the  error  in  identification 
represented  a  shift  of  either  one  (high  similarity)  or  two  (low  similarity)  serial  posi¬ 
tions.  For  example,  if  S2  were  misidentified  as  matching  p,  when  the  actual  matching 
study  item  was  S3,  this  error  of  one  serial  position  was  categorized  as  high  temporal 
similarity;  if  Si  were  misidentified  as  matching  p,  when  the  actual  matching  study 
item  was  S3,  the  error  of  two  serial  positions  was  categorized  as  low  temporal  similar¬ 
ity.  A  factorial  cross  of  spatial  and  temporal  variables  produced  four  combinations  of 
spatiotemporal  differences  between  p  and  the  misidentified  study  item.  Trials  involv¬ 
ing  a  match  to  S2  were  omitted  from  this  analysis,  because  clear  predictions  involving 
those  trials  could  not  be  made  within  the  theoretical  framework  used  here. 

Figure  18  shows  the  proportions  of  source  errors  falling  into  each  of  the  four 
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Spatial  Similarity  Spatial  Similarity 

Figure  18.  Proportion  of  misidentified  items  on  T  trials  as  a  function  of  the  spatial  and 
temporal  similarity  between  correct  and  misidentified  study  stimuli.  Panel  A.  Misidentifi- 
cations  in  Experiment  1.  Panel  B.  Misidentifications  in  Experiment  2.  Two  levels  of  spatial 
difference  are  plotted  on  the  x-axis.  Each  of  them  is  plotted  separately  for  stimulus  pairs 
that  were  temporally  similar  (black  bars),  and  for  pairs  that  were  temporally  dissimilar 
(gray  bars). 


categories.  In  each  panel,  the  horizontal-axis  shows  the  two  levels  of  spatial  similarity; 
black  bars  show  results  with  high  temporal  similarity,  and  gray  bars  show  results  with 
low  temporal  similarity.  Note  first  that  we  can  easily  dismiss  the  hypothesis  that 
all  misidentifications  resulted  from  a  completely  stochastic  process.  Monte  Carlo 
simulation  showed  that  a  completely  random  process,  which  would  produce  correct 
identification  of  serial  position  on  just  0.1875  of  all  trials,  would  generate  values 
as  extreme  as  the  highest  value  in  either  panel  of  Figure  18  on  fewer  than  one  in 
1,000,000  replications  of  the  experiment.  In  fact,  the  distribution  of  misidentifications 
is  consistent  with  the  alternative  hypothesis,  namely  that  both  temporal  and  physical 
similarity  induced  source  errors,  with  physical  similarity  producing  a  larger  effect  than 
temporal  similarity.  Moreover,  the  two  panels  in  Figure  18  shows  no  evidence  of  an 
interaction  between  the  two  variables,  that  is,  the  black  and  gray  bars  at  low  spatial 
similarity  differ  by  about  as  much  as  the  corresponding  bars  at  high  spatial  similarity 
in  both  Experiments  1  and  2.  Although  the  errors  induced  by  perceptual  similarity 
were  larger  than  those  induced  by  temporal  similarity  in  both  experiments,  the  power 
of  temporal  similarity  seems  stronger  in  Experiment  2.  The  one  difference  between 
Experiment  1  and  2,  the  proportion  of  T  trials,  could  explain  this  difference  in  results. 
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The  summed  similarity  of  p  and  study  items  is  an  effective  predictor  of  recogni¬ 
tion  memory  of  stimuli  like  those  used  here  (Kahana  k  Sekuler,  2002;  Nosofsky  k 
Kantner,  2005).  Figure  19  illustrates  in  schematic  form  some  key  elements  of  NEMo, 
the  summed  similarity  model  proposed  by  Kahana  and  Sekuler  (2002).  NEMo  as¬ 
sumes  that  that  study  items,  Si  S2,  S3,  are  stored  in  memory  as  corresponding  noisy 
exemplars,  mj,  m2,  and  m3,  where  exemplars’  subscripts  signify  the  order  in  which 
the  stimuli  were  presented.  NEMo  computes  pairwise  similarities,  771,772,%,  between 
p  and  the  noisy  exemplar  of  each  study  item.  If  the  sum,  E,,,  of  these  pairwise  sim¬ 
ilarities  exceeds  an  (optimal)  criterion,  the  model  responds  that  p  had  been  in  the 
study  series.  If  Ef)  fails  to  exceed  the  criterion  value,  the  model  responds  that  p 
was  not  among  the  items  in  the  study  series.  In  NEMo’s  computation,  sets  of  study 
and  p  items,  together  with  random  noise  in  the  exemplar  representations  produce 
a  distribution  of  values  of  E,,.  To  enhance  the  transparency  of  the  following  line  of 
argument  and  to  avoid  having  to  estimate  noise  parameters  for  this  purpose,  we  will 
put  aside  the  effects  of  noise,  treating  exemplars  as  though  they  were  noise  free.  In 
addition,  for  the  sake  of  generality  we  will  ignore  /?,  a  parameter  that  distinguishes 
NEMo  from  other  summed  similarity  models;  /3  captures  the  similarity  of  each  study 
item  to  the  others  (Kahana  k  Sekuler,  2002;  Nosofsky  k  Kantner,  2005).  Neither  of 
these  simplifying  departures  from  NEMo  is  consequential  for  what  follows. 

By  definition,  on  T  trials  one  study  item  matched  p.  As  a  result  of  this  maximum 
similarity  between  one  of  the  study  items  and  p,  E,,  tends  to  be  higher  on  T  trials 
than  on  L  trials.  Note  that  this  difference  in  E^  for  T  and  L  trials  will  vary  over  trials, 
depending  upon  the  particular  study  items  and  p  presented  on  each  trial.  However, 
the  mean  difference  in  summed  similarity  between  the  two  trial  types  will  produce 
greater  than  chance  success  in  recognition.  The  match  between  p  and  one  of  the  study 
items  has  another  consequence.  On  T  trials  one  value  of  77*  will  always  be  constant, 
at  the  upper  limit  of  possible  similarity  values.  On  T  trials  only  two  values  of  77*  are 
free  to  vary,  whereas  on  L  trials  all  three  values  vary.  Therefore,  the  variance  in  E^, 
for  T  trials  will  tend  to  be  smaller  than  the  variance  in  E^  for  L  trials.  In  order  to 
motivate  the  generation  of  ROC  curves,  we  can  cast  these  distributional  relationships 
into  signal  detection  terms,  describing  the  distribution  of  E,,  for  T  trials  as  the  signal 
distribution,  and  the  distribution  of  E^  for  L  trials  as  the  noise  distribution. 

To  examine  the  link  between  zROC  slope  and  the  distributions  of  E^  values  on 
T  and  L  trials,  we  calculated  the  summed  similarity  for  each  trial  in  our  study.  To 
minimize  assumptions  needed  for  the  calculation,  we  substituted  pairwise  Euclidean 
spatial  frequency  differences  between  stimuli  for  the  corresponding  perceptual  dif¬ 
ferences  specified  by  NEMo.  Although  these  two  variables,  physical  and  perceptual 
distance,  are  most  likely  related  by  an  exponential  transform  (Kahana  k  Sekuler, 
2002),  any  increasing  monotonic  relationship  between  the  two  variable  leaves  the  ar¬ 
gument  unchanged,  p-study  summed  distances  were  calculated  separately  for  T trials 
and  for  L  trials  in  Experiment  3.  The  frequency  distributions  of  E^  for  all  stimulus 
sets  that  appeared  in  Experiment  3  are  plotted  in  Figure  20A.  Note  that  the  x-axis 
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Figure  19.  Schematic  of  model  showing  key  stages  that  could  lead  to  recognition  and 
identification  responses.  Samples  of  noise  are  added  to  visual  representations  of  the  study 
stimuli  (sj,  S2,  S3)  producing  a  set  of  corresponding,  noisy  exemplars  (mi,  m2,  m3)  which 
are  stored  in  memory.  At  the  presentation  of  the  probe  stimulus,  the  similarity  between 
p  and  each  memory  representation  is  computed  and  stored  as  rj\,  7/2,  rj3-  These  separate 
similarity  measurements  are  combined  into  a  summed  similarity  value,  Omitted  from 
this  schematic  representation  are  parameters  that  transform  physical,  stimulus  distances 
into  perceptual  similarity,  and  (3,  which  captures  variation  in  the  overall  similarity  of  study 
items  to  one  another.  So  that  the  model  can  also  identify  the  serial  position  of  the  study  item 
that  matched  p,  one  might  add  a  max  operator  that  returns  the  serial  position  associated 
with  the  largest  member  of  the  similarity  set,  771 , 772, 773- 


has  been  reversed  so  that  the  smallest  value  of  summed  distance  lies  to  the  right.  Be¬ 
cause  summed  similarity,  T,v,  and  summed  distance  are  inversely  related,  the  reversal 
of  the  normal  x-axis  direction  represents  increased  summed  similarity  from  left  to 
right,  and  also  brings  the  visual  format  of  Figure  20A’s  distributions  into  conformity 
with  formats  commonly  used  in  signal  detection  theory.  As  expected,  the  mean  value 
of  E^  for  T  trials  tended  to  be  larger  than  the  comparable  value  for  L  trials;  also, 
the  distribution  of  for  L  trials  had  larger  variance  than  did  E,,  for  T  trials.  This 
unequal  variance  in  the  empirical  signal  and  noise  distributions  is  consistent  the  fact 
that  recognition  zROC  slopes  were  >1.0  (Wickens,  2002). 

To  demonstrate  that  distributional  differences  in  E^  for  T  and  L  trials  would  actu¬ 
ally  have  the  predicted  effect  on  zROC  slope,  we  applied  a  signal  detection  analysis  to 
the  recognition  data  that  would  have  been  produced  by  several  different  distributions 
of  E,,.  We  simulated  recognition  for  several  different  subsets  of  stimuli,  which  were 
derived  from  the  stimulus  sets  actually  used  in  Experiment  3,  the  one  experiment 
in  which  recognition  was  measured  directly.  First,  we  calculated  the  value  of  E,,  for 
all  400  L  trials  and  for  all  400  T  trials  that  each  subject  saw.  Figure  20A  shows 
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Figure  20.  Panel  A:  Distributions  of  summed  similarity  values  for  all  T  trials  (upper 
sub-panel)  and  all  L  trials  (lower  sub-panel)  used  in  Experiment  3  and  in  simulations  of 
recognition.  L  trials  used  both  in  Experiment  3  and  in  the  first  simulation  are  bracketed  by 
the  rectangle  with  the  thickest  line.  The  two  narrower  rectangles  bracket  L  trials  used  in 
the  second  and  third  simulations.  Panel  B:  Mean  and  SeM  slopes  of  zROCs  calculated  for 
three  sets  of  L  trials;  the  variance  of  L  trials’  summed  similarity  values  varied  systematically 
among  the  three  sets  of  trials.  The  rightmost  point  shows  the  mean  and  SeM  zROC  slope 
simulated  for  all  stimulus  sets  used  in  Experiment  3.  The  thick,  medium,  and  thin  bars  each 
correspond  the  simulations  with  large,  medium,  and  small  variance  of  summed  similarity. 
The  thickness  of  lines  in  Panel  B  corresponds  to  those  in  Panel  A.  The  standard  deviation 
of  the  T  distribution  is  1.68;  the  standard  deviation  for  L  trials  was  2.1. 


the  distributions  of  for  the  two  trial  types,  aggregated  over  all  subjects.  Note 
that  because  stimulus  sets  were  generated  randomly  from  the  pool  of  25  items,  each 
subject  had  been  tested  with  a  partially  unique  set  of  T  and  L  stimuli.  From  each 
subject’s  responses  to  his  or  her  400  T  and  400  L  trials,  we  generated  a  zROC  curve 
and  calculated  its  slope.  Note  that  these  800  trials  were  the  actual  stimulus  set  used 
in  Experiment  3,  the  mean  slope  shown  in  the  leftmost  point  in  Figure  20B  was  1.09, 
a  value  identical  to  the  corresponding  value  obtained  in  the  actual  experiment  (see 
rightmost  point  in  Figure  17B). 

Having  confirmed  that  the  variances  of  distributions  of  E^  differed  between  the 
complete  set  of  T  and  L  trials,  we  constituted  two  subsets  of  stimuli.  For  the  first 
subset  we  sought  to  reverse  the  relationship  between  the  distributions’  variances, 
while  holding  constant  the  difference  between  distributions’  means;  we  reasoned  that 
these  new,  modified  distributions  should  produce  zROC  slopes  below  one.  Starting 
with  each  subject’s  original  stimulus  set  of  400  T  and  400  L  trials  we  carved  out  a 
reduced-variance  L  distribution  by  selecting  60  L  trials  whose  E^  values  were  near 
the  mean  value  of  E^.  In  addition,  60  T  trials  were  randomly  selected  without  regard 


ROBERT  SEKULER,  PRINCIPAL  INVESTIGATOR 


52 


to  their  value  of  This  maneuver  reduced  the  variance  for  noise  trials,  leaving 
the  variance  for  signals  trials  unchanged,  while  also  preserving  the  mean  difference 
between  the  two  distributions.  Altering  the  relative  variances  of  E^  for  T  and  L 
trials  had  the  expected  effect  on  zROC  slope:  the  relatively  narrower  distribution 
of  summed  similarities  values  on  L  trials  produced  a  simulated  zROC  mean  slope  of 
0.85,  a  value  considerably  below  1.0  (the  rightmost  value  shown  in  Figure  20B). 

Finally,  we  repeated  the  simulation  with  a  somewhat  larger  subset  of  values  sam¬ 
pled  from  the  original  distribution  of  E^  from  L  trials;  the  aim  was  to  generate  a 
sample  of  stimuli  in  which  the  variance  of  E^  for  L  trials  was  approximately  the  same 
as  that  for  T  trials,  which  should  produce  zROC  slopes  near  1.0.  We  drew  250  L  tri¬ 
als  and  250  T  trials  from  the  original  sets  of  400  items.  L  trials  were  drawn  without 
replacement  from  a  region  in  vicinity  of  the  mean  for  all  L  trials,  but  T  trials  for  the 
subset  were  drawn  at  random  from  the  entire  distribution  of  400  trials.  The  result 
was  a  ratio  of  variances  between  distributions  that  was  intermediate  to  the  ratios  for 
the  60-trial  sets  and  for  the  complete,  400-trial  sets.  Again,  in  constructing  these  set 
of  stimuli,  we  held  constant  the  mean  distance  between  these  two  distributions  of  E^. 
From  each  subject’s  own  responses  in  Experiment  3  to  each  of  these  stimulus  lists, 
we  generated  a  zROC  recognition  curve  for  that  subject.  The  mean  and  standard 
error  of  the  zROC  slopes  are  shown  by  the  middle  point  in  Figure  20B.  Note  that  as 
expected,  the  mean  zROC  slope  here  was  intermediate  to  the  mean  slopes  from  the 
other  two  simulated  conditions,  and  was  close  to  1.0. 

As  mentioned  earlier,  the  recognition  zROC  slopes  produced  in  our  experiments 
were  all  >  1,  a  result  that  is  inconsistent  with  most  previous  reports  (for  example, 
Ratcliff  et  ah,  1992).  In  concert  with  expectations  from  signal  detection  theory  (Egan, 
1975;  Wickens,  2002),  the  simulations  represented  in  Figure  20A  and  B  show  that 
our  result  arose  from  the  differences  in  the  variances  of  signal  and  noise  distributions. 
As  we  have  demonstrated,  the  summed  similarity  values  for  L  trials  used  trials  in 
Experiment  3  had  larger  variance  than  the  values  for  T  trials,  a  fact  that  signal 
detection  theory  predicts,  and  our  simulations  confirm,  would  produce  zROC  slopes 
>1.0. 

It  is  difficult  to  specify  how  our  results  could  be  extended  to  more  commonly  used 
memory  materials  including  words.  For  one  thing,  it  is  difficult  to  know  what  metric 
space  is  appropriate  to  describe  such  materials,  which  makes  it  hard  to  quantify  their 
similarity  relations.  However,  our  simulations  do  imply  that  one  should  not  expect 
any  standard  or  canonical  value  of  recognition  zROC  slope.  Instead,  they  suggest 
that  zROC  slope  will  almost  certainly  depend  upon  variables  such  as  the  particular 
stimulus  values  that  constitute  stimulus  lists,  the  range  and  distribution  of  pairwise 
similarities  represented  in  the  stimulus  pool,  the  algorithm  used  to  make  up  stimulus 
lists  from  that  pool,  and  the  number  of  items  in  the  study  list.  For  example,  with 
factors  constant,  the  larger  the  study  list,  the  smaller  the  impact  on  T  trial  summed 
similarity  of  the  one  study  that  matches  p.  This  diminished  impact  will  reduce  the 
difference  in  the  variance  in  Erj  associated  with  T  and  L  trials. 
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Our  ROC  results  clearly  deviate  from  their  counterparts  in  memory  results  for 
verbal  materials.  As  already  discussed,  we  find  zROC  slopes  of  1. 1-1.3  for  recognition 
and  0.8  to  0.9  for  source  identification,  whereas  studies  using  verbal  materials  report 
slopes  of  0.8  to  1.0  for  recognition  and  approximately  1.0  for  source  identification. 
For  recognition  zROC  slopes,  it  is  likely  that  in  our  experiments  lures  are  much  more 
similar  to  study  items  than  would  be  the  case  with  commonly  used  verbal  stimuli. 
For  source  recognition  results,  our  stimuli  and  task  differ  in  many  ways  from  stimuli 
and  task  used  with  verbal  materials,  which  makes  it  impossible  to  know  which  might 
actually  be  responsible  for  the  difference  in  results.  For  example,  our  task  used 
short  lists  of  study  items,  and  defined  source  in  terms  of  serial  position  that  an  item 
occupied;  analogous  studies  with  verbal  material  have  used  considerably  longer  lists 
of  study  items,  and  have  mostly  defined  source  in  terms  of  the  voice,  male  or  female, 
in  the  which  the  words  were  heard. 


Source  of  misattribution  errors  Sensory  researchers  have  long  understood  that 
perceptual  errors,  such  as  illusions,  can  be  valuable  sources  of  insight  into  perception’s 
normal  operation  (Eagleman,  2001).  In  the  same  way,  from  the  very  beginning  of 
systematic  research  on  memory  (Ebbinghaus,  1885/1913),  errors  and  failures  have 
been  extraordinarily  useful  in  illuminating  memory’s  normal  operation.  Here,  we 
have  focused  on  one  particular  kind  of  error  -misattribution  or  source  errors  (Johnson 
et  al.,  1993;  Johnson,  1997). 

The  results  in  Figure  18  suggest  that  spatial  and  temporal  similarity  both  influence 
misidentifications  of  serial  position,  and  that  there  separate  effects  are  approximately 
additive.  How,  though,  might  these  influences  operate?  To  understand  how  spatial 
similarity  might  lead  to  source  errors,  consider  a  recent  account  of  false  alarms  in 
recognition  judgments.  As  described  before,  Kahana  and  Sekuler  (2002) ’s  NEMo 
model  assumes  that  three  study  items,  si,  S2,  S3,  are  stored  in  memory  as  noisy 
exemplars.  When  the  probe,  p,  is  presented,  rji . . .  773,  the  set  of  similarities  between 
p  and  each  of  the  noisy  exemplars  is  computed.  Again,  subscripts  signify  the  order 
in  which  study  stimuli  were  presented  originally.  NEMo  describes  each  similarity 
value  as  exponentially  decreasing  function  of  the  spatial  difference  between  p  and  the 
corresponding  values,  mi,  m2,  m3.  From  the  resulting  similarity  values,  the  summed 
similarity,  E,,,  is  computed.  In  NEMo,  a  value  of  E n  >  k,  where  k  is  an  optimal 
criterion,  constitutes  evidence  that  at  least  one  of  the  study  items  matched  p,  which 
makes  p  seem  familiar.  Over  trials,  the  probability  of  a  recognition  response  (that  is, 
a  yes  response)  corresponds  to  the  proportion  of  trials  on  which  E^  >  k. 

To  implement  judgments  of  serial  position,  we  propose  that  NEMo  could  be  ex¬ 
panded  to  allow  the  model  to  perform  an  additional  operation  on  the  set  of  pairwise 
p-study  item  similarities  it  was  already  computing.  We  are  not  proposing  here  a 
full,  formal,  quantitative  account  of  this  modification,  and  considerable  new  data 
would  be  required  to  explore  any  such  account.  However,  we  do  think  it  worthwhile 
to  sketch  out  the  key  elements  of  what  we  see  as  one  plausible  approach  to  serial 
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position  judgments  that  is  compatible  with  a  general,  summed  similarity  approach. 

In  an  expanded  version  of  NEMo,  each  trial’s  pairwise  similarities  would  processed 
with  a  max  operator,  which  returns  the  index  of  the  largest  item  in  771,772,773.  In 
the  absence  of  error,  this  index  would  be  the  serial  position  of  the  study  item  that 
most  closely  matches  p.  The  subject  bases  the  serial  position  identification  upon 
that  returned  index.  Because  of  noise  associated  with  each  exemplar,  there  will  be 
trials  on  which  the  index  returned  by  max  will  not  correspond  to  the  serial  position 
whose  study  item  physically  matched  p,  but  would  correspond  instead  to  the  serial 
position  of  another  study  item.  On  such  trials,  the  would  model  generate  a  source 
error,  misidentifying  the  serial  position  of  the  matching  item’s  similarity  of  p.  The 
probability  of  such  errors  will  be  some  monotonically  decreasing  function  of  p.  In 
other  words,  study  items  that  did  not  match  p  but  were  perceptually  similar  to  it 
would  be  more  likely  misidentified  as  the  match  than  would  study  items  that  were 
less  similar  to  p.  This  is  the  pattern  of  results  shown  in  Figure  18. 

A  different  mechanism  is  required  to  motivate  the  other,  temporal  source  of 
misidentifications  in  our  results.  Drawing  upon  an  account  of  temporal  effects  in 
free  recall  (Howard  &  Kahana,  1999,  2002),  we  assume  that  the  representation  of 
each  noisy  exemplar  is  tagged  in  memory  with  a  temporal  code.  We  assume  further 
that  each  item’s  temporal  tag  can  be  degraded  as  a  result  of  passage  of  time  and/or 
interference.  If  this  degradation  were  partial  rather  than  complete  (Dodson,  Holland, 
&  Shimamura,  1998),  serially  adjacent  positions  in  a  sequence  would  more  likely  be 
confused  with  one  another  than  would  be  positions  more  widely  separated  in  a  se¬ 
quence.  In  the  case  at  hand,  with  partial  loss  of  serial  position  information,  Si  is  more 
likely  misremembered  as  S2  than  as  S3,  and  S3  is  more  likely  misremembered  as  S2 
than  as  Si.  This  account  is  mute  on  errors  involving  forgetting  of  S2’s  serial  position. 
Again,  this  is  the  pattern  of  results  seen  in  Figure  18.  We  should  note  that  this  effect 
resembles  Dodson  et  al.  (1998)’s  demonstration  that  even  when  subjects  misiden- 
tify  the  source  of  information,  they  sometimes  retain  partial  information  about  that 
source. 

We  should  note  that  the  max  operator  with  which  NEMo  might  be  expanded  is 
kin  to  max  operators  demonstrated  in  several  other  contexts.  For  example,  Gawne 
and  Martin  (2002)  showed  that  some  neurons  in  primate  visual  cortex  behave  in  ac¬ 
cordance  with  a  max  operator  when  two  stimuli  are  present  simultaneously  in  the 
receptive  fields  of  such  neurons.  Closer  to  our  own  use  of  a  max  operator  is  the 
important  role  that  such  an  operator  plays  in  competitive  queuing  in  planning  and 
producing  serially-ordered  behaviors  (Bullock  k,  Rhodes,  2003;  Rhodes,  Bullock,  Ver- 
wey,  Averbeck,  &  Page,  2004;  Agam,  Bullock,  &  Sekuler,  2005).  It  is  noteworthy  that 
competitive  queuing  often  leads  to  transposition  errors,  a  counterpart  to  the  temporal 
order  transpositions  demonstrated  in  our  experiments.  Although  other,  alternative 
processes  may  ultimately  prove  to  be  responsible  for  serial  position  identification,  we 
believe  that  a  max  operator  could  quite  plausibly  participate  in  that  process. 

Across  the  three  experiments  reported  here,  zROC  curves  generated  for  recog- 
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nition  performance  had  slopes  >1.  This  was  true  in  Experiments  1  and  2,  where 
recognition  performance  was  estimated  from  source  identification  judgments,  as  well 
as  in  Experiment  3,  where  recognition  performance  was  measured  directly,  that  is, 
from  recognition  judgments.  Analyzing  the  distributional  characteristics  of  T  and 
L  trials  we  demonstrated  that  zROC  slopes  were  well  accommodated  by  integrat¬ 
ing  a  summed  similarity  computation  for  episodic  recognition  and  a  signal  detection 
framework  for  decision  making. 
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manuscripts  have  been  submitted,  one  is  under  review,  and  the  other  is  in  press  at  the 
journal  Memory  &  Cognition.  One  of  these  manuscripts  reports  the  work  described 
above,  under  Objective  One;  the  other  manuscript  reports  work  done  under  AFOSR 
support  builds  on  the  paradigm  introduced  in  (Zhou  et  al.,  2004)  and  applies  the 
modeling  techniques  developed  by  (Kahana  &  Sekuler,  2002).  In  addition  to  these 
published  or  submitted  manuscripts,  the  PI  and  graduate  researchers  on  the  project 
have  presented  subsets  of  the  results  at  meetings  of  the  Visual  Sciences  Society  in 
2003,  2004  and  2005,  the  Mathematical  Psychology  Society  in  2004,  the  European 
Conference  on  Visual  Perception  in  2003,  and  the  Psychonomics  Society  in  2004.  In 
addition,  the  PI  took  part  in  the  AFOSR  Cognition  Program  Review  in  2004. 

Interactions/Transitions 

Publications  relating  to  this  project  (Kahana  k  Sekuler,  2002;  Zhou  et  al.,  2004) 
and  our  many  presentations  at  professional  meetings  have  attracted  considerable  in¬ 
terest  among  vision  researchers  and  among  researchers  interested  in  memory.  As  a 
result  of  learning  about  our  work  and  discussions  with  the  PI  and  his  collaborators, 
Rob  Nosofsky  (Indiana  University),  a  leading  researcher  in  categorization  and  mem¬ 
ory,  has  replicated  key  aspects  of  NEMo,  our  model  for  episodic  recognition  memory 
(Nosofsky  &  Kantner,  2005).  In  addition,  following  up  the  AFOSR  Program  Re¬ 
view,  the  PI  launched  a  small  scale  effort  to  explore  the  online  visuospatial  memory 
paradigm  presented  at  the  Arizona  Program  Review  by  Don  Lyon  and  Kevin  Gluck 
(Air  Force  Research  Laboratory).  To  facilitate  this  effort,  Lyon  generously  gave  the 
PI  access  to  task  software  developed  at  AFRL;  it  is  expected  that  this  effort  will 
continue  even  though  AFOSR  support  has  ended. 
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